We construct here a abundance correlation based network via WGCNA on CLR transformed 16S data and visualize different aspects in integrated heatmaps and direct correlation to environmental and physiological parameters. The Output is directly handed over to Cytoscape which are # here to allow for knitting Picurst was run on an HPC but seems not informative with many undescribed or lowly described bacterial taxa in this estuarine fish gill dataset.
before you start: .rs.restartR()
#-
Some seasonal networks as conststuced here:
https://journals.asm.org/doi/full/10.1128/spectrum.02943-23
Finally, three metrics have been extracted from the networks with the function “Network Analyzer” in Cytoscape: degree (DG), neighborhood connectivity (NC), and closeness centrality (CC). Those network topological parameters allow a better understanding of the dynamics in different networks. DG is the number of edges connected from one node to another (184). The more a node has edges, the more it is locally connected, indicating its relevance in the network (185). CC is a qualitative measure that indicates if a node is close to the other nodes in the network (186). The shortest path length to spread information from one node to another is represented, and it indicates if the node has an important influence in the network and how it can interact with other nodes (184). Finally, NC is a quantitative measure that calculates the average connectivity of a node to the other nodes in the network. It indicates how the node impacts in the network dynamics (187).
works, just un-# the whole chunk
closeness centrality (CC). CC is a qualitative measure that indicates if a node is close to the other nodes in the network (186). The shortest path length to spread information from one node to another is represented, and it indicates if the node has an important influence in the network and how it can interact with other nodes (184).
Finally, NC is a quantitative measure that calculates the average connectivity of a node to the other nodes in the network. It indicates how the node impacts in the network dynamics (187).
WGCNA_list <- list()
for (Dataset in c(names(pslist)[!grepl("WF", names(pslist))][grepl("ps_GC|ps_OE", names(pslist)[!grepl("WF", names(pslist))])], "ps_WF")) {
require(WGCNA)
require(plyr)
require(dplyr)
require(ggrepel)
require(cowplot)
require(phyloseq)
Datasetname <- sub("ps_", "", Dataset)
ps <- pslist[[Dataset]]
psraw <-pslistraw[[Dataset]]
WGCNA_list_length <- length(WGCNA_list)
Analysis <- "WGCNA"
prefix <- "SSU-"
if (Datasetname %in% c("OE", "GC")) {
traitData <- c(
"HSI", "SSI", "GSI", "FCF", "FillLevel", "Age", "Length",
"NH4", "NO2", "NO3", "O2", "PO4", "TOC", "Temp", "SPM", "Salinity"
) } else if (Datasetname %in% c("WF")) {
traitData <- c(
"NH4", "NO2", "NO3", "O2", "PO4", "TOC", "Temp", "SPM", "Salinity"
) }
omics_data0 <- as.data.frame(otu_table( microbiome::transform(ps, "clr"))) #%>% t()
WGCNA_list[[Datasetname]] <- list()
WGCNA_list[[Datasetname]][["plots"]] <- list()
WGCNA_list[[Datasetname]][["ps"]] <- pslist[[Dataset]]
WGCNA_list[[Datasetname]][["psraw"]] <- pslistraw[[Dataset]]
WGCNA_list[[Datasetname]][["omics_data0"]] <- omics_data0
WGCNA_list[[Datasetname]][["traitData"]] <- traitData
}
## Loading required package: WGCNA
## Warning: package 'WGCNA' was built under R version 4.3.1
## Loading required package: dynamicTreeCut
## Loading required package: fastcluster
## Warning: package 'fastcluster' was built under R version 4.3.1
##
## Attaching package: 'fastcluster'
## The following object is masked from 'package:stats':
##
## hclust
##
##
## Attaching package: 'WGCNA'
## The following object is masked from 'package:stats':
##
## cor
## Loading required package: plyr
## Warning: package 'plyr' was built under R version 4.3.1
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following object is masked from 'package:purrr':
##
## compact
## Loading required package: ggrepel
## Warning: package 'ggrepel' was built under R version 4.3.1
## Loading required package: cowplot
## Warning: package 'cowplot' was built under R version 4.3.1
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:lubridate':
##
## stamp
## Loading required package: phyloseq
Some comments on normalization: - Normalization: Strand et al., 2021 4.2.1. Data preprocessing Transcript expression counts were summed for each gene. Expression values were normalized across samples using the Trimmed mean of M values (TMM)-method [33], and log2- transformed. Genes with expression levels below 1.0 in all samples and/or with a standard deviation less than 0.15 were removed before network inference. This reduced the number of genes from 48,057 to 37,408. We retained OTUs that contribute at least 0.005% of the total microbial abundance. This reduced the number of OTUs from 1152 to 296 and the number of zeros in the abundance table by 50%. OTU abundances was normalized using the Cumulative Sum Scaling (CSS) method from the Bioconductor package metagenome- Seq, which by default log2-transforms the data.
Swift et al. 2021: Cumulative-Sum Score (CSS) uses robust statistics to provide an alternative to TSS that is less influenced by preferentially sampled taxa. Its normalization scaling factor is defined as the cumulative sum of the observed counts up to a threshold which is determined using a heuristic that minimizes the influence of the preferentially sampled taxa. The idea is that CSS scales each sample using only the part of the counts distribution that is relatively invariant (H. Lin &Peddada, 2020a; Weiss et al., 2017). However, the determination of the thresholds could fail due to high count variability (J. Chen et al., 2018). CSS or TSS do not account for sparsity or address compositionality.
The DESeq scaling factor of the observed abundances for each sample is computed as the median of all the ratios between counts of the sample and the reference. Assuming most taxa are not differentially abundant, the median of these ratios would provide an estimate of a correction factor for all the read counts. Like most normalizations, TMM and DESeq rely on the strong assumptions that most taxa are not differentially abundant, and of those differentially abundant, there is an approximate balanced amount of over-abundance and under-abundance (Calza & Pawitan, 2010; Dillies et al., 2012; Gloor et al., 2017; Weiss et al., 2017). These may be reasonable assumptions for gene expression data but may not be valid for microbiome data which tends to be diverse. Also, while TMM and DESEq normalization are designed for compositional data, they do not address sparsity. In fact, large fractions of zeros and relatively low sequencing depths can be problematic for both methods and can lead to serious biases (Kumar et al., 2018). As stated previously, adding pseudo-counts to the original count data will not rectify the problem.
Large scale tool benchmarking by Thorsen et al. has revealed that most of the commonly used differential (relative) abundance testing methods including edgeR and DESeq are sensitive to sparsity (Thorsen et al., 2016) and consequently exhibit unacceptably high false positive rates (Hawinkel et al., 2019). Also, recent investigations by H. Lin and Peddada (2020a, 2020b) indicate poor performance of these methods for microbiome data. However, based on simulations and analytical derivations, they attribute the poor performance to the violation of the assumption that most taxa do not change. As a result, the false discovery rate is inflated, and it increases as the sample size increases. Similar behavior of these methods was also seen by Weiss et al. (2017). Thus, H. Lin and Peddada (2020a) do not advocate the use of edgeR and DESeq for the analysis of microbiome data.
following http://mixomics.org/mixmc/mixmc-preprocessing/ Arumugam et al., (2011)
for (Datasetname in names(WGCNA_list)) {
psraw <- WGCNA_list[[Datasetname]]$psraw
library(phyloseq)
library(tidyverse)
require(DESeq2)
require(SummarizedExperiment)
low.count.removal <- function(
data, # OTU count df of size n (sample) x p (OTU)
percent=0.005 # cutoff chosen
) {
keep.otu = which(colSums(data)*100/(sum(colSums(data))) > percent)
data.filter = data[,keep.otu]
return(list(data.filter = data.filter, keep.otu = keep.otu))}
#######################################################
#Plot consequence of filtering from Strand et al, 2021#
#######################################################
frac_zero <- c()
all_sum <- c()
num_otu <- c()
seqp <- seq(0,0.1,0.0001)
for(i in seqp){
result.filter = low.count.removal(otu_table(psraw), percent=i)
length(result.filter$keep.otu)
otu.f = result.filter$data.filter
frac_zero <- c(frac_zero, sum(otu.f == 00)/ (ncol(otu.f)*nrow(otu.f)))
all_sum <- c(all_sum, sum(otu.f))
num_otu <- c(num_otu, ncol(otu.f))
}
names(frac_zero) <- seqp
names(all_sum) <- seqp
applied_filter <- 0.005
# Get all_sum and num_otu to a fraction of the total
mas <- max(all_sum)
mno <- max(num_otu)
all_sum %>% (function(x){x/max(x)}) -> all_sum
num_otu %>% (function(x){x/max(x)}) -> num_otu
data.frame(filter = seqp,
Percent.zeros = frac_zero*100,
Total.abundance = all_sum*100,
Number.of.OTUs = num_otu*100) %>%
tidyr::gather(key = "Type", value = "percent.of.total", -filter) %>%
ggplot() +
geom_line(mapping = aes(x = filter, y = percent.of.total, col = Type)) +
geom_vline(xintercept = applied_filter, alpha = 0.5, color = "red", linetype="dashed")+
ylab("Percent of total") +
xlab(paste("Filter at", applied_filter)) +
theme(legend.title = element_blank()) -> FilterZerosPlot
WGCNA_list[[Datasetname]][["plots"]][["FilterZerosPlot"]] <- FilterZerosPlot
plot(FilterZerosPlot)
##############
#Plot clr PCR#
##############
TSE <-mia::makeTreeSummarizedExperimentFromPhyloseq(microbiome::transform(WGCNA_list[[Datasetname]]$ps, "clr"))
tryCatch({
SSUclrPCAPlot<-pcaplotRK3(TSE,intgroup = c("Reps"), pcX = 1, pcY = 2, title="",ellipse = TRUE, ellipse.prob = 0.5) +
scale_fill_manual(values=col.Palette$col.Palette.Reps) + #col.Palette.SeqCenter #col.Palette.Cruises
scale_color_manual(values=col.Palette$col.Palette.Reps) + atheme +
theme_minimal() + atheme + theme(axis.title.x = element_blank()) +
theme(
panel.grid.major = element_line(colour = "grey50"),
panel.grid.minor = element_line(colour = "grey50"))
prow <- cowplot::plot_grid(SSUclrPCAPlot, labels = c(""), ncol = 1)
title <- ggdraw() + draw_label_themeRKwhite(paste(Datasetname), element = "plot.title",x = 0.05, hjust = 0, vjust = 1)
subtitle <- ggdraw() + draw_label_themeRKwhite(paste("clr-PCA", "with",
length(rownames(TSE)),"bacterial species",sep = " "), element = "plot.subtitle",x = 0.05, hjust = 0,
vjust = 1)
SSUclrPCAPlot<- plot_grid(title, subtitle, prow, ncol = 1, rel_heights = c(0, 0.05, 0.989))
ggsave(SSUclrPCAPlot, filename = paste(paste(save_name, "SSU", "Filter-clrPCA", Datasetname, sep="_"),".png", sep=""), path = pathPlots , device='png', dpi=300, width = 7,
height = 7)
plot(SSUclrPCAPlot)
}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
WGCNA_list[[Datasetname]][["plots"]][["SSUclrPCAPlot"]] <- SSUclrPCAPlot
}
## [1] "Adapted from Marini F, Binder H (2019). “pcaExplorer: an R/Bioconductor package for interacting with RNA-seq principal components.” BMC Bioinformatics, 20(1), 331. doi:10.1186/s12859-019-2879-1, https://bioconductor.org/packages/pcaExplorer/."
## [1] "Adapted from Marini F, Binder H (2019). “pcaExplorer: an R/Bioconductor package for interacting with RNA-seq principal components.” BMC Bioinformatics, 20(1), 331. doi:10.1186/s12859-019-2879-1, https://bioconductor.org/packages/pcaExplorer/."
## [1] "Adapted from Marini F, Binder H (2019). “pcaExplorer: an R/Bioconductor package for interacting with RNA-seq principal components.” BMC Bioinformatics, 20(1), 331. doi:10.1186/s12859-019-2879-1, https://bioconductor.org/packages/pcaExplorer/."
for (Datasetname in names(WGCNA_list)) {
omics_data0 <- WGCNA_list[[Datasetname]]$omics_data0
ps <- WGCNA_list[[Datasetname]]$ps
traitData <- WGCNA_list[[Datasetname]][["traitData"]]
#Import Data
library(WGCNA)
# The following setting is important, do not omit.
options(stringsAsFactors = FALSE);
dim(omics_data0 %>% paste(c("Samples", "OTUs")))
gsg = goodSamplesGenes(omics_data0, verbose = 3);
gsg$allOK
#Check Data Quality
if (!gsg$allOK)
{
# Optionally, print the gene and sample names that were removed:
if (sum(!gsg$goodGenes)>0)
printFlush(paste("Removing genes:", paste(names(omics_data0)[!gsg$goodGenes], collapse = ", ")));
if (sum(!gsg$goodSamples)>0)
printFlush(paste("Removing samples:", paste(rownames(omics_data0)[!gsg$goodSamples], collapse = ", ")));
# Remove the offending genes and samples from the data:
omics_data0 = omics_data0[gsg$goodSamples, gsg$goodGenes]
}
#Plot Euclidean Sample Distance
sampleTree = hclust(dist(omics_data0), method = "average");
# Plot the sample tree: Open a graphic output window of size 12 by 9 inches
# The user should change the dimensions if the window is too large or too small.
#pdf(file = "Plots/sampleClustering.pdf", width = 12, height = 9);
par(cex = 0.6);
par(mar = c(0,4,2,0))
plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5,
cex.axis = 1.5, cex.main = 2)
# Plot a line to show the cut
abline(h = 100, col = "red");
# Determine cluster under the line
clust = cutreeStatic(sampleTree, cutHeight = 100, minSize = 10)
table(clust)
# clust 1 contains the samples we want to keep.
#in Case we would exclude outliers from the tree:
#keepSamples = (clust==1)
#omics_data = omics_data0[keepSamples, ]
recordPlot() -> SampleClusteringTreePlot
WGCNA_list[[Datasetname]][["plots"]][["SampleClusteringTreePlot"]] <- SampleClusteringTreePlot
###############################
#Subset samples when excluding#
###############################
#Keep all samples
omics_data <- omics_data0
nGenes = ncol(omics_data)
nSamples = nrow(omics_data)
#Clean TraitData
datTraits <-SAMDF16S %>%
filter(SampleID %in% rownames(omics_data))
rownames(datTraits) <- datTraits$SampleID
datTraits<- datTraits[,traitData]
print("NAs in TraitData")
print(table(is.na(datTraits)))
print((datTraits[!complete.cases(datTraits), ]))
datTraits <- datTraits[complete.cases(datTraits), ]
dim(datTraits)
omics_data <- omics_data[rownames(omics_data) %in% rownames(datTraits),]
dim(omics_data)
ps <- prune_samples(sample_names(ps)[sample_names(ps) %in% rownames(omics_data)], ps)
print(Datasetname)
print(ps)
collectGarbage()
# Re-cluster samples
sampleTree2 = hclust(dist(omics_data), method = "average")
# Convert traits to a color representation: white means low, red means high, grey means missing entry
traitColors = numbers2colors(datTraits, signed = FALSE)
# Plot the sample dendrogram and the colors underneath.
plotDendroAndColors(sampleTree2, traitColors,
groupLabels = names(datTraits),
main = "Sample dendrogram and trait heatmap")
recordPlot() -> SampleClusteringTreeTraitsPlot
WGCNA_list[[Datasetname]][["plots"]][["SampleClusteringTreeTraitsPlot"]] <- SampleClusteringTreeTraitsPlot
WGCNA_list[[Datasetname]][["omics_data"]] <- omics_data
WGCNA_list[[Datasetname]][["datTraits"]] <- datTraits
WGCNA_list[[Datasetname]]$ps <- ps
}
## Flagging genes and samples with too many missing values...
## ..step 1
## [1] "NAs in TraitData"
##
## FALSE
## 2208
## [1] HSI SSI GSI FCF FillLevel Age Length
## [8] NH4 NO2 NO3 O2 PO4 TOC Temp
## [15] SPM Salinity
## <0 rows> (or 0-length row.names)
## [1] "OE"
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 1552 taxa and 138 samples ]
## sample_data() Sample Data: [ 138 samples by 61 sample variables ]
## tax_table() Taxonomy Table: [ 1552 taxa by 7 taxonomic ranks ]
## refseq() DNAStringSet: [ 1552 reference sequences ]
## Flagging genes and samples with too many missing values...
## ..step 1
## [1] "NAs in TraitData"
##
## FALSE
## 1408
## [1] HSI SSI GSI FCF FillLevel Age Length
## [8] NH4 NO2 NO3 O2 PO4 TOC Temp
## [15] SPM Salinity
## <0 rows> (or 0-length row.names)
## [1] "GC"
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 1344 taxa and 88 samples ]
## sample_data() Sample Data: [ 88 samples by 61 sample variables ]
## tax_table() Taxonomy Table: [ 1344 taxa by 7 taxonomic ranks ]
## refseq() DNAStringSet: [ 1344 reference sequences ]
## Flagging genes and samples with too many missing values...
## ..step 1
## [1] "NAs in TraitData"
##
## FALSE
## 171
## [1] NH4 NO2 NO3 O2 PO4 TOC Temp SPM
## [9] Salinity
## <0 rows> (or 0-length row.names)
## [1] "WF"
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 2364 taxa and 19 samples ]
## sample_data() Sample Data: [ 19 samples by 61 sample variables ]
## tax_table() Taxonomy Table: [ 2364 taxa by 7 taxonomic ranks ]
## refseq() DNAStringSet: [ 2364 reference sequences ]
saveRDS(WGCNA_list, file = paste0(file.path(path_Output_16S, paste(paste(save_name, "WGCNA_list", Date, sep="_"), ".rds", sep=""))))
##############
#Summary Plot#
##############
for (Datasetname in names(WGCNA_list)) {
plots <- WGCNA_list[[Datasetname]][["plots"]]
cowplot::plot_grid(plots$FilterZerosPlot , plots$SSUclrPCAPlot, labels = c("A", "B"), ncol = 2,
rel_heights = c(1,1)) -> part_1
cowplot::plot_grid(part_1 , plots$SampleClusteringTreeTraitsPlot, labels = c("", "C"), ncol = 1) -> part_2
ggsave(part_2, filename = paste(paste(save_name, "SSU_DataInputPlot", Datasetname, Date, sep="_"),".png", sep=""), path = pathPlots , device='png', dpi=300, width = 10,
height = 12)
}
Rosales et al., 2023 Network analysis To identify ASVs that co-associate in AH, DU, and DL samples, CLR-transformed counts were used for weighted correlation network analysis (WGCNA) with the WGCNA 1.70-3R package [76]. The network was constructed unsigned with the following parameters: power = 3, minimum module size = 12, deep split = 2, and merged cut height = 0.25. The eigenvalues were correlated to AH, DU, and DL using Pearson correlation with the R function cor. The highest correlation in each disease state was then selected for network construction using the R package SpiecEasi 1.0.5 [77]. The network was then constructed as previously reported [11]. Briefly, the Stability Approach to Regularization Selection (StARS) [77] model was chosen along with the method Meinshausen–Bühlmann’s neighborhood selection [78]. For StARS, 100 subsamples were used with a variability threshold of 10−3. The centrality (node importance) was evaluated [79] using the functions centrality_degree (neighbors = the number of adjacent edges or neighbors) and centrality_edge_betweenness (centrality = the number of shortest paths going through an edge) [80]. The package influenceR 0.1.0. [81] selected important ASVs in each network and assigned the top “key players” [38], which were labeled with their respective orders.
Jameson et al., 2023 Co-occurrence patterns between taxa with putative roles in N2O production and the rest of the microbial community ASVs were explored using proportionality analysis within the propr package52. ASV tables were trimmed to select taxa that occurred ≥10 times in at least 10% of samples prior to network-level analyses to improve interpretability and minimize the risk of spurious correlations. Pairwise interactions between individual taxa with rho values greater than 0.60 were plotted using Cytoscape v3.9.0 and network topological indices were calculated using the NetworkAnalyzer tool114. Relationships between microbial community structure and rate processes were then assessed using weighted gene correlational network analysis (WGCNA) performed with the WGCNA package115. The signed adjacency measure was first calculated for each pair of features (ASVs) by raising the absolute value of their pairwise correlation coefficients to a soft-thresholding power of 8 to maximize the scale-free topology fit. Hierarchical clustering of taxa into discrete subnetworks was completed using a minimum module size threshold of 20 and a dissimilarity threshold of 0.3. Pearson correlation coefficients and corresponding p-values are reported for correlations between sample traits, subnetwork eigengenes, and individual ASVs (Supplementary Data 1). Subnetwork membership and intranetwork connectivity measures are also reported for each ASV and were used in further analyses to assess broad relationships between ASV connectivity and importance with respect to N2O production rates.
Mach et al., 2021 To confirm the results of the DE analysis, the weighted gene co-expression network analysis (WGCNA) method was also run on the “E1” matrix using the WGCNA R package (version 1.69) (Langfelder and Horvath, 2008). The parameters for the analysis were set as follows: “corFnc” = bicor, “type” = signed hybrid, “beta” = 10, “deepSplit” = 4, “minClusterSize” = 30, “cutHeight” = 0.1. The eigengenes corresponding to each identified module were correlated individually to all the 1H NMR and biochemical assay metabolites, i.e., a set of 56 different molecules (see next paragraphs). A module was then considered positively or negatively associated to this set of molecules if the Pearson r correlation values were ≥ |0.65| for at least 5 molecules and if all the corresponding p-values were ≤ 1e−05. The positively and the negatively correlated modules defined in this way were merged to obtain a single gene list, which was subsequently compared to the differentially expressed genes (DEGs) list using a Venn diagram.
Strand et al, 2021 4.2.2. Network inference For network inference, we used the Weighted Gene Co-expression Network Analysis (WGCNA) R package [24] and the function blockwiseModules with the bicor correlation measure and parameters maxBlockSize = 10000, networkType = “signed”, TOMType = “signed”, corType = “bicor”, maxPOutliers = 0.05, replaceMissingAdjacencies = TRUE, pamStage = F, deepSplit = 4, minModuleSize = 2, minKMEtoStay = 0.5, minCoreKME = 0.5, minCoreKMESize = 2, reassignThreshold = 0 mergeCutHeight = 0.4/0.5 (for microbiota/host respectively).
Our analysis relies heavily on network modules, and hence the parameters related to module detection and trimming influence the results. Briefly, deepSplit controls the sensitivity of the module detection approach by hierarchical clustering, with a value of 1 being the least sensitive and 4 being the most sensitive. minModuleSize controls the minimum size of modules in the clustering step. Nodes with a correlation to the module eigennode (KME) lower than minKMEtoStay are trimmed from the module, and the module is deleted if it does not have a core of at least minCoreKMESize nodes (with core nodes being defined as having a KME greater than minCoreKME). Finally, different modules with eigennodes that correlate above the 1 – mergeCutHeight threshold are merged. Note that the final modules can be smaller than minModuleSize due to trimming (but not smaller than minCoreKMESize), and that they can include nodes with a KME lower than minKMEtoStay due to module merging. Our parameters were set to detect highly correlated and potentially small modules initially, thus not missing interesting profiles displayed by few genes/OTUs, and then to apply an aggressive merging threshold to avoid dealing with highly redundant modules in downstream analysis.
for (Datasetname in names(WGCNA_list)) {
library(WGCNA)
# The following setting is important, do not omit.
options(stringsAsFactors = FALSE)
omics_data <- WGCNA_list[[Datasetname]]$omics_data
datTraits <- WGCNA_list[[Datasetname]]$datTraits
#Import Data
# Choose a set of soft-thresholding powers
powers = c(c(1:10), seq(from = 12, to=20, by=2))
# Call the network topology analysis function
allowWGCNAThreads() #needed otherwise would not work! https://www.biostars.org/p/122349/
sft = pickSoftThreshold(omics_data, powerVector = powers, verbose = 5)
# Plot the results:
# Find the soft thresholding power beta to which co-expression similarity is raised to calculate adjacency.
# based on the criterion of approximate scale-free topology.
idx <- min(which((-sign(sft$fitIndices[,3])*sft$fitIndices[,2]) > 0.90))
if(is.infinite(idx)){
idx <- min(which((-sign(sft$fitIndices[,3])*sft$fitIndices[,2]) > 0.80))
if(!is.infinite(idx)){
st <- sft$fitIndices[idx,1]
} else{
idx <- which.max(-sign(sft$fitIndices[,3])*sft$fitIndices[,2])
st <- sft$fitIndices[idx,1]
}
} else{st <- sft$fitIndices[idx,1]}
# Plot Scale independence measure and Mean connectivity measure
# Scale-free topology fit index as a function of the soft-thresholding power
data.frame(Indices = sft$fitIndices[,1],
sfApprox = -sign(sft$fitIndices[,3])*sft$fitIndices[,2]) %>%
ggplot() +
geom_hline(yintercept = 0.9, color = "red", alpha = 0.6) + # corresponds to R^2 cut-off of 0.9
geom_hline(yintercept = 0.8, color = "red", alpha = 0.2) + # corresponds to R^2 cut-off of 0.8
geom_line(aes(x = Indices, y = sfApprox), color = "red", alpha = 0.1, size = 2.5) +
geom_text(mapping = aes(x = Indices, y = sfApprox, label = Indices), color = "red", size = 4) +
ggtitle("Scale independence") +
xlab("Soft Threshold (power)") +
ylab("SF Model Fit,signed R^2") +
xlim(1,20) +
ylim(-1,1) +
geom_segment(aes(x = st, y = 0.25, xend = st, yend = sfApprox[idx]-0.05),
arrow = arrow(length = unit(0.2,"cm")),
size = 0.5)-> scale_independence_plot
WGCNA_list[[Datasetname]][["plots"]][["scale_independence_plot"]] <-scale_independence_plot
# Mean connectivity as a function of the soft-thresholding power
data.frame(Indices = sft$fitIndices[,1],
meanApprox = sft$fitIndices[,5]) %>%
ggplot() +
geom_line(aes(x = Indices, y = meanApprox), color = "red", alpha = 0.1, size = 2.5) +
geom_text(mapping = aes(x = Indices, y = meanApprox, label = Indices), color = "red", size = 4) +
xlab("Soft Threshold (power)") +
ylab("Mean Connectivity") +
geom_segment(aes(x = st-0.4,
y = sft$fitIndices$mean.k.[idx],
xend = 0,
yend = sft$fitIndices$mean.k.[idx]),
arrow = arrow(length = unit(0.2,"cm")),
size = 0.4) +
ggtitle(paste0("Mean connectivity: ",
round(sft$fitIndices$mean.k.[idx],2))) -> mean_connectivity_plot
cowplot::plot_grid(scale_independence_plot, mean_connectivity_plot, ncol = 2, align = "h",
labels = c("A", "B")) -> si_mc_plot
WGCNA_list[[Datasetname]][["plots"]][["si_mc_plot"]] <- si_mc_plot
ggsave(si_mc_plot, filename = paste(paste(save_name, "soft_thresholding_power" , Datasetname,
sep="_"),".png", sep=""), path = pathPlots, device='png', dpi=300, width = 8, height = 6)
si_mc_plot
softPower = st; print(paste("SoftPower chosen to", softPower))
WGCNA_list[[Datasetname]][["softPower"]] <- softPower
}
## Allowing multi-threading with up to 8 threads.
## pickSoftThreshold: will use block size 1552.
## pickSoftThreshold: calculating connectivity for given powers...
## ..working on genes 1 through 1552 of 1552
## Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
## 1 1 0.194 -1.080 0.586 231.000 2.21e+02 451.0
## 2 2 0.739 -1.530 0.742 59.600 4.98e+01 206.0
## 3 3 0.932 -1.570 0.915 22.200 1.44e+01 125.0
## 4 4 0.917 -1.420 0.914 11.100 4.94e+00 90.1
## 5 5 0.909 -1.280 0.908 6.750 1.90e+00 72.0
## 6 6 0.907 -1.160 0.903 4.730 8.20e-01 60.8
## 7 7 0.927 -1.080 0.926 3.600 3.70e-01 52.9
## 8 8 0.949 -1.030 0.941 2.900 1.83e-01 47.1
## 9 9 0.945 -1.000 0.936 2.410 8.96e-02 42.5
## 10 10 0.931 -0.979 0.915 2.060 4.43e-02 38.7
## 11 12 0.913 -0.957 0.893 1.580 1.19e-02 32.9
## 12 14 0.938 -0.941 0.921 1.260 3.53e-03 28.6
## 13 16 0.961 -0.916 0.950 1.040 1.06e-03 25.2
## 14 18 0.950 -0.914 0.936 0.877 3.17e-04 22.5
## 15 20 0.922 -0.922 0.900 0.750 1.09e-04 20.3
## [1] "SoftPower chosen to 3"
## Allowing multi-threading with up to 8 threads.
## pickSoftThreshold: will use block size 1344.
## pickSoftThreshold: calculating connectivity for given powers...
## ..working on genes 1 through 1344 of 1344
## Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
## 1 1 0.0484 1.040 0.955 193.000 1.92e+02 277.00
## 2 2 0.1070 -0.667 0.930 45.900 4.40e+01 96.60
## 3 3 0.7140 -1.630 0.892 15.100 1.33e+01 53.20
## 4 4 0.9340 -1.860 0.958 6.530 4.71e+00 37.10
## 5 5 0.9530 -1.730 0.946 3.490 1.91e+00 29.20
## 6 6 0.9710 -1.550 0.966 2.190 8.44e-01 24.40
## 7 7 0.9840 -1.410 0.983 1.550 3.96e-01 21.10
## 8 8 0.9820 -1.310 0.982 1.180 1.97e-01 18.60
## 9 9 0.9770 -1.240 0.975 0.944 1.02e-01 16.70
## 10 10 0.9420 -1.200 0.928 0.786 5.48e-02 15.00
## 11 12 0.9680 -1.100 0.963 0.585 1.70e-02 12.40
## 12 14 0.9280 -1.040 0.912 0.463 5.66e-03 10.50
## 13 16 0.9550 -0.974 0.942 0.381 1.96e-03 8.89
## 14 18 0.9370 -0.936 0.921 0.322 7.01e-04 7.65
## 15 20 0.9000 -0.921 0.872 0.277 2.58e-04 6.93
## [1] "SoftPower chosen to 4"
## Allowing multi-threading with up to 8 threads.
## pickSoftThreshold: will use block size 2364.
## pickSoftThreshold: calculating connectivity for given powers...
## ..working on genes 1 through 2364 of 2364
## Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
## 1 1 0.1460 1.180 0.959 679.00 667.000 1020.0
## 2 2 0.0603 -0.389 0.918 283.00 266.000 573.0
## 3 3 0.4100 -0.917 0.942 143.00 125.000 365.0
## 4 4 0.6520 -1.180 0.954 80.90 66.500 252.0
## 5 5 0.7580 -1.290 0.957 50.20 39.100 183.0
## 6 6 0.8220 -1.340 0.952 33.30 24.200 139.0
## 7 7 0.8580 -1.350 0.938 23.30 15.700 109.0
## 8 8 0.9270 -1.320 0.962 17.10 10.400 87.5
## 9 9 0.9580 -1.310 0.981 12.90 7.090 73.5
## 10 10 0.9530 -1.340 0.970 10.10 4.920 64.7
## 11 12 0.9550 -1.390 0.970 6.64 2.570 53.3
## 12 14 0.9640 -1.420 0.988 4.70 1.450 45.7
## 13 16 0.9590 -1.400 0.980 3.52 0.874 40.3
## 14 18 0.9530 -1.390 0.979 2.76 0.548 36.1
## 15 20 0.9500 -1.370 0.974 2.24 0.352 32.9
## [1] "SoftPower chosen to 8"
deepSplit integer value between 0 and 4. Provides a simplified control over how sensitive module detection should be to module splitting, with 0 least and 4 most sensitive. See cutreeDynamic for more details.
https://support.bioconductor.org/p/104602/ mergeCutHeight I am not aware of a principle from which one could derive an “appropriate” value, but in practice, on data sets with 50-100 samples, using 0.15 to 0.2 has worked fairly well. For fewer samples a larger value (0.25 to 0.3) may be warranted. If you want larger modules, increase the value; if you want smaller modules at the risk of having redundant modules (modules with very similar functional annotation and very similar fuzzy module membership), you can decrease the value to say 0.10, maybe even lower if you have lots of samples (hundreds or more). https://support.bioconductor.org/p/101579/ reassignThreshold First things first: grey is not really a module, it is a label for unassigned genes, and the eigengene and kME for the grey “module” are more or less meaningless. In other words, ignore the eigengene and kME values for the grey “module”.
WGCNA assigns module labels using dynamic tree cut (look up dynmaicTreeCut) of hierarchical clustering tree that is based on the Toplogical Overlap Measure (TOM). TOM results in similar but not quite the same similarity as correlation, hence for some genes the assigned module may differ from the module with highest kME. Module merging can also play a part here.
Practically speaking, genes will have a high kME to their assigned module. When assigned module and module of highest kME differ, the gene probably has high kME to both and can be considered intermediate between the two modules.
I don’t really recommend this, but if you absolutely want all genes to be assigned to the module of highest kME, try using argument reassignThreshold=1 to blockwiseModules. This will re-assign all genes to the module of their highest kME after the initial modules have been identified. Note though that the reassignment is not iterated with module eigengene re-calculation.
In all, I don’t worry about the module assignment vs. max. kME differences in my own analyses, and I recommend not worrying it about it to others as well.
https://cran.r-project.org/web/packages/WGCNA/WGCNA.pdf maxPOutliers maxPOutliers specifies the maximum percentile of data that can be considered outliers on either side of the median separately. For each side of the median, if higher percentile than maxPOutliers is considered an outlier by the weight function based on 9*mad(x), the width of the weight function is increased such that the percentile of outliers on that side of the median equals maxPOutliers. Using maxPOutliers=1 will effectively disable all weight function broadening; using maxPOutliers=0 will give results that are quite similar (but not equal to) Pearson correlation.
minCoreKMESize a number between 0 and 1. If a detected module does not have at least minModuleKMESize genes with eigengene connectivity at least minCoreKME, the module is disbanded (its genes are unlabeled and returned to the pool of genes waiting for mofule detection).
minKMEtoStay
genes whose eigengene connectivity to their module eigengene is lower
than minKMEtoStay are removed from the module.
for (Datasetname in names(WGCNA_list)) {
library(WGCNA)
# The following setting is important, do not omit.
options(stringsAsFactors = FALSE)
omics_data <- WGCNA_list[[Datasetname]]$omics_data
datTraits <- WGCNA_list[[Datasetname]]$datTraits
softPower <- WGCNA_list[[Datasetname]]$softPower
##################
#blockwiseModules#
##################
print(Datasetname)
print(paste("SoftPower chosen to", softPower))
if (Datasetname %in% c("OE", "GC")) {
maxPOutliers_value <- 0.05
mergeCutHeight_value <- 0.15
deepSplit_value <- 3
} else if (Datasetname %in% c("WF")) {
maxPOutliers_value <- 0.1
mergeCutHeight_value <- 0.3
deepSplit_value <- 2
}
network = blockwiseModules(omics_data,
power = st,
networkType = "signed",
TOMType = "signed",
corType = "bicor",
maxPOutliers = maxPOutliers_value,
minModuleSize = 3,
#reassignThreshold = 0,
minCoreKME = 0.5, # Default 0.5
#minCoreKMESize = 3, # Default minModuleSize/3,
minKMEtoStay = 0.5, # Default 0.3 High intramodular correlation
deepSplit = deepSplit_value,
mergeCutHeight = mergeCutHeight_value,
numericLabels = TRUE,
pamRespectsDendro = FALSE,
saveTOMs = TRUE,
saveTOMFileBase = paste(Species,Tissue,Type,Season,Datasetname, "TOM", sep="_"),
verbose = 3)
#Strand et., 2021
# network <- blockwiseModules(omics_data,
# power = st,
# networkType = "signed",
# TOMType = "signed",
# corType = "bicor",
# maxPOutliers = 0.05,
# deepSplit = 4, # Default 2
# minModuleSize = 2, # 30
# minCoreKME = 0.5, # Default 0.5
# minCoreKMESize = 2, # Default minModuleSize/3,
# minKMEtoStay = 0.5, # Default 0.3
# reassignThreshold = 0, # Default 1e-6
# mergeCutHeight = 0.4, # Default 0.15
# pamStage = FALSE,
# pamRespectsDendro = TRUE,
# replaceMissingAdjacencies = TRUE,
# numericLabels = TRUE,
# saveTOMs = TRUE,
# saveTOMFileBase = paste(Species,Tissue,Type,Season, "TOM", sep="_"),
# verbose = 3)
moduleLabels <- network$colors
moduleColors <- labels2colors(network$colors)
MEs <- network$MEs
geneTree <- network$dendrograms[[1]]
WGCNA_list[[Datasetname]][["MEs"]] <- MEs
WGCNA_list[[Datasetname]][["moduleLabels"]] <- moduleLabels
WGCNA_list[[Datasetname]][["moduleColors"]] <- moduleColors
WGCNA_list[[Datasetname]][["geneTree"]] <- geneTree
WGCNA_list[[Datasetname]][["network"]] <- network #we also save the whole network as its small from 16S
}
## [1] "OE"
## [1] "SoftPower chosen to 3"
## Calculating module eigengenes block-wise from all genes
## Flagging genes and samples with too many missing values...
## ..step 1
## ..Working on block 1 .
## TOM calculation: adjacency..
## ..will use 8 parallel threads.
## Fraction of slow calculations: 0.000000
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
## ..saving TOM for block 1 into file OE_GC_Gill_16S_SU_AU_WI_SP_OE_TOM-block.1.RData
## ....clustering..
## ....detecting modules..
## ....calculating module eigengenes..
## ....checking kME in modules..
## ..removing 239 genes from module 1 because their KME is too low.
## ..removing 168 genes from module 2 because their KME is too low.
## ..removing 147 genes from module 3 because their KME is too low.
## ..removing 38 genes from module 4 because their KME is too low.
## ..removing 29 genes from module 5 because their KME is too low.
## ..removing 42 genes from module 6 because their KME is too low.
## ..removing 28 genes from module 7 because their KME is too low.
## ..removing 27 genes from module 8 because their KME is too low.
## ..removing 5 genes from module 9 because their KME is too low.
## ..removing 3 genes from module 10 because their KME is too low.
## ..removing 5 genes from module 11 because their KME is too low.
## ..reassigning 7 genes from module 2 to modules with higher KME.
## ..reassigning 1 genes from module 4 to modules with higher KME.
## ..reassigning 1 genes from module 9 to modules with higher KME.
## ..merging modules that are too close..
## mergeCloseModules: Merging modules whose distance is less than 0.15
## Calculating new MEs...
## [1] "GC"
## [1] "SoftPower chosen to 4"
## Calculating module eigengenes block-wise from all genes
## Flagging genes and samples with too many missing values...
## ..step 1
## ..Working on block 1 .
## TOM calculation: adjacency..
## ..will use 8 parallel threads.
## Fraction of slow calculations: 0.000000
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
## ..saving TOM for block 1 into file OE_GC_Gill_16S_SU_AU_WI_SP_GC_TOM-block.1.RData
## ....clustering..
## ....detecting modules..
## ....calculating module eigengenes..
## ....checking kME in modules..
## ..removing 123 genes from module 1 because their KME is too low.
## ..removing 175 genes from module 2 because their KME is too low.
## ..removing 60 genes from module 3 because their KME is too low.
## ..removing 58 genes from module 4 because their KME is too low.
## ..removing 58 genes from module 5 because their KME is too low.
## ..removing 14 genes from module 6 because their KME is too low.
## ..removing 6 genes from module 7 because their KME is too low.
## ..removing 11 genes from module 8 because their KME is too low.
## ..removing 10 genes from module 9 because their KME is too low.
## ..removing 6 genes from module 10 because their KME is too low.
## ..removing 3 genes from module 11 because their KME is too low.
## ..removing 1 genes from module 13 because their KME is too low.
## ..removing 6 genes from module 14 because their KME is too low.
## ..removing 2 genes from module 15 because their KME is too low.
## ..removing 1 genes from module 16 because their KME is too low.
## ..reassigning 3 genes from module 1 to modules with higher KME.
## ..merging modules that are too close..
## mergeCloseModules: Merging modules whose distance is less than 0.15
## Calculating new MEs...
## [1] "WF"
## [1] "SoftPower chosen to 8"
## Calculating module eigengenes block-wise from all genes
## Flagging genes and samples with too many missing values...
## ..step 1
## ..Working on block 1 .
## TOM calculation: adjacency..
## ..will use 8 parallel threads.
## Fraction of slow calculations: 0.000000
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
## ..saving TOM for block 1 into file OE_GC_Gill_16S_SU_AU_WI_SP_WF_TOM-block.1.RData
## ....clustering..
## ....detecting modules..
## ....calculating module eigengenes..
## ....checking kME in modules..
## ..removing 49 genes from module 1 because their KME is too low.
## ..removing 46 genes from module 2 because their KME is too low.
## ..removing 10 genes from module 3 because their KME is too low.
## ..removing 8 genes from module 4 because their KME is too low.
## ..removing 30 genes from module 5 because their KME is too low.
## ..removing 14 genes from module 6 because their KME is too low.
## ..removing 8 genes from module 7 because their KME is too low.
## ..removing 11 genes from module 8 because their KME is too low.
## ..removing 5 genes from module 9 because their KME is too low.
## ..removing 6 genes from module 10 because their KME is too low.
## ..removing 17 genes from module 11 because their KME is too low.
## ..removing 1 genes from module 12 because their KME is too low.
## ..removing 1 genes from module 13 because their KME is too low.
## ..removing 4 genes from module 14 because their KME is too low.
## ..removing 1 genes from module 15 because their KME is too low.
## ..removing 6 genes from module 16 because their KME is too low.
## ..removing 5 genes from module 17 because their KME is too low.
## ..removing 7 genes from module 18 because their KME is too low.
## ..removing 10 genes from module 20 because their KME is too low.
## ..removing 4 genes from module 21 because their KME is too low.
## ..removing 9 genes from module 22 because their KME is too low.
## ..removing 6 genes from module 23 because their KME is too low.
## ..removing 9 genes from module 24 because their KME is too low.
## ..removing 1 genes from module 25 because their KME is too low.
## ..removing 4 genes from module 26 because their KME is too low.
## ..removing 4 genes from module 27 because their KME is too low.
## ..removing 1 genes from module 28 because their KME is too low.
## ..removing 1 genes from module 30 because their KME is too low.
## ..removing 1 genes from module 31 because their KME is too low.
## ..removing 10 genes from module 32 because their KME is too low.
## ..removing 2 genes from module 33 because their KME is too low.
## ..removing 2 genes from module 35 because their KME is too low.
## ..removing 1 genes from module 36 because their KME is too low.
## ..removing 2 genes from module 38 because their KME is too low.
## ..merging modules that are too close..
## mergeCloseModules: Merging modules whose distance is less than 0.3
## Calculating new MEs...
for (Datasetname in names(WGCNA_list)) {
library(WGCNA)
# The following setting is important, do not omit.
options(stringsAsFactors = FALSE)
omics_data <- WGCNA_list[[Datasetname]]$omics_data
datTraits <- WGCNA_list[[Datasetname]]$datTraits
moduleLabels <- WGCNA_list[[Datasetname]]$moduleLabels
moduleColors <- WGCNA_list[[Datasetname]]$moduleColors
MEs <- WGCNA_list[[Datasetname]]$MEs
network <- WGCNA_list[[Datasetname]]$network
ps <- WGCNA_list[[Datasetname]]$ps
#####################
#plotDendroAndColors#
#####################
##############################
#Number of ASV in each module#
##############################
as.data.frame(table(moduleLabels)) %>%
dplyr::rename(Module = moduleLabels, Size = Freq) %>%
dplyr::mutate(Module_color = labels2colors(as.numeric(as.character(Module)))) -> module_size
module_size %>%
ggplot(aes(x = Module, y = Size, fill = Module)) +
geom_col(color = "#000000") +
ggtitle("Number of genes in each module") +
theme(legend.position = "none") +
scale_fill_manual(values = setNames(module_size$Module_color,module_size$Module)) +
geom_text(aes(label = Size),vjust = 0.5, hjust = -0.18, size = 3.5) +
ylim(0, max(module_size$Size)*1.1) +
theme(plot.margin = margin(2, 2, 2, 2, "pt")) +
coord_flip() -> ASVModuleSizePlot
ASVModuleSizePlot
ggsave(ASVModuleSizePlot, filename = paste(paste(save_name, "ASVModulePlot" , Datasetname, sep="_"),
".png", sep=""), path = pathPlots, device='png', dpi=300, width = 8, height = 6)
WGCNA_list[[Datasetname]][["plots"]][["ASVModuleSizePlot"]] <-ASVModuleSizePlot
#######################################
#Module-Eigenegene-Correlation-Heatmap#
#######################################
MEs_R <- bicor(MEs, MEs, maxPOutliers = 0.05)
idx.r <- which(rownames(MEs_R) == "ME0")
idx.c <- which(colnames(MEs_R) == "ME0")
MEs_R_noME0 <- MEs_R[-idx.r, -idx.c]
MEs_R_noME0[upper.tri(MEs_R_noME0)] %>%
as.data.frame() %>%
dplyr::rename("correlation" = ".") %>%
ggplot(aes(x=correlation)) +
geom_histogram(bins = 20) +
#geom_density() +
xlim(-1, 1) +
ggtitle(paste0(prefix,"ME correlation\n w/o ",prefix ,"ME0")) -> MEs_R_density
WGCNA_list[[Datasetname]][["plots"]][["MEs_R_density"]] <-MEs_R_density
pheatmap::pheatmap(MEs_R_noME0, color = colorRampPalette(c("Blue", "White", "Red"))(100),
silent = T,
breaks = seq(-1,1,length.out = 101),
treeheight_row = 5,
treeheight_col = 5,
main = paste0(prefix,"ME correlation heatmap w/o ",prefix ,"ME0"),
labels_row = paste0(prefix, rownames(MEs_R)),
labels_col = paste0(prefix, colnames(MEs_R))) -> MEs_R_Corr
ggsave(MEs_R_Corr, filename = paste(paste(save_name, "MEs_R_Corr" , Datasetname, sep="_"),".png", sep=""),
path = pathPlots, device='png', dpi=300, width = 8, height = 6)
WGCNA_list[[Datasetname]][["plots"]][["MEs_R_Corr"]] <-MEs_R_Corr
MEs_R_Corr
cowplot::plot_grid(MEs_R_density, MEs_R_Corr$gtable, labels = c("D", "E"), rel_widths = c(0.6, 1)) ->
density_eigen
density_eigen
#######################
#Network-Visualization#
#######################
# 5 Visualization of networks within R
# 5.a Visualizing the gene network
# One way to visualize a weighted network is to plot its heatmap, Fig. 1. Each row and column of the heatmap
# correspond to a single gene. The heatmap can depict adjacencies or topological overlaps, with light colors denoting
# low adjacency (overlap) and darker colors higher adjacency (overlap). In addition, the gene dendrograms and module
# colors are plotted along the top and left side of the heatmap. The package provides a convenient function to create
# such network plots; Fig. 1 was created using the following code. This code can be executed only if the network
# was calculated using a single-block approach (that is, using the 1-step automatic or the step-by-step tutorials). If
# the networks were calculated using the block-wise approach, the user will need to modify this code to perform the
# visualization in each block separately. The modi cation is simple and we leave it as an exercise for the interested
# reader.
# Calculate topological overlap anew: this could be done more efficiently by saving the TOM
# calculated during module detection, but let us do it again here.
# dissTOM = 1-TOMsimilarityFromExpr(omics_data, power = softPower);
# # Transform dissTOM with a power to make moderately strong connections more visible in the heatmap
# plotTOM = dissTOM^10;
# # Set diagonal to NA for a nicer plot
# diag(plotTOM) = NA;
# # Call the plot function
# TOMplot(plotTOM, geneTree, moduleColors, main = "Network heatmap plot, all genes")
#############################
# Correlation within modules# from Strand et al, 2021
#############################
corr_within_module <- function(omics_data, network, module_x = 1){
idx.omics_data <- which(network$colors == module_x)
idx.me <- which(colnames(network$MEs) == paste0("ME",module_x))
kME_x <- bicor(omics_data[,idx.omics_data], network$MEs[,idx.me], maxPOutliers = 0.05)
kME_x}
ggplot.list <- list()
for(m in colnames(network$MEs)){
h <- as.numeric(sub("ME","", m))
data.frame(x = suppressWarnings(corr_within_module(omics_data = omics_data, network = network,
module_x = h))) %>%
ggplot() +
geom_histogram(aes(x), fill = labels2colors(h), color = "black", alpha = 0.5, bins = 20) +
xlim(-1, 1) +
xlab("ASV correlation")+
ggtitle(paste0(prefix,m)) -> da_plot
ggplot.list[[m]] <- da_plot}
ggplot.list <- ggplot.list[ggplot.list %>% names() %>% sub("ME", "", .) %>% as.numeric() %>% order()]
cowplot::plot_grid(plotlist = ggplot.list, ncol = 3) -> density_all_plot
density_all_plot
ggsave(density_all_plot, filename = paste(paste(save_name, "WithinModuleCorrelation" ,Datasetname, sep="_"),".png", sep=""), path = pathPlots, device='png', dpi=300, width = 8, height = 12)
WGCNA_list[[Datasetname]][["plots"]][["density_all_plot"]] <-density_all_plot
}
##############
#Summary Plot#
##############
for (Datasetname in names(WGCNA_list)) {
plots <- WGCNA_list[[Datasetname]][["plots"]]
cowplot::plot_grid(plots$si_mc_plot, plots$ASVModuleSizePlot, labels = c("","C"), ncol = 2) -> part_1
cowplot::plot_grid(part_1, plots$density_eigen, labels = c("", ""), ncol = 1, rel_widths = c(1,0.5)) -> part_2
cowplot::plot_grid(part_2, plots$density_all_plot, labels = c("", "F"), rel_widths = c(1,0.1), ncol = 1) -> part_3
ggsave(part_2, filename = paste(paste(save_name, "SSU_Network", Datasetname, Date, sep="_"),".png", sep=""),
path = pathPlots , device='png', dpi=300, width = 12, height = 10)
plot(part_2)
}
for (Datasetname in names(WGCNA_list)) {
omics_data <- WGCNA_list[[Datasetname]]$omics_data
datTraits <- WGCNA_list[[Datasetname]]$datTraits
moduleLabels <- WGCNA_list[[Datasetname]]$moduleLabels
moduleColors <- WGCNA_list[[Datasetname]]$moduleColors
MEs <- WGCNA_list[[Datasetname]]$MEs
network <- WGCNA_list[[Datasetname]]$network
traitData <- WGCNA_list[[Datasetname]]$traitData
###################
#Self made Heatmap#
###################
#https://bioinformaticsworkbook.org/tutorials/wgcna.html#gsc.tab=0
# Define numbers of genes and samples
MEs = orderMEs(MEs)
names(MEs) = names(MEs) %>% gsub("ME","", .) %>% paste("SSU",., sep="")
moduleTraitCor = cor(MEs, datTraits, use = "p");
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples);
textMatrix <- paste(signif(moduleTraitCor, 2), "\n(", signif(moduleTraitPvalue, 1), ")", sep = "")
dim(textMatrix) <- dim(moduleTraitCor)
# Add treatment names
MEs$treatment = row.names(MEs)
mat <- as.data.frame(t(moduleTraitCor))
mat$traits <- rownames(mat)
# tidy & plot data
module_order = names(MEs)
mME = mat %>%
pivot_longer(-traits) %>%
mutate(#name = gsub("ME", "", name),
name = factor(name, levels = module_order))
textMatrixLong <- as.data.frame(t(signif(moduleTraitCor, 2)))
textMatrixLong$traits <- rownames(textMatrixLong)
textMatrixLong = textMatrixLong %>%
pivot_longer(-traits) %>%
mutate(
#name = gsub("ME", "", name),
name = factor(name, levels = module_order))
textMatrixLong <- as.data.frame(textMatrixLong)
textMatrixLong2 <- as.data.frame(t(signif(moduleTraitPvalue, 1)))
textMatrixLong2$traits <- rownames(textMatrixLong2)
textMatrixLong2 = textMatrixLong2%>%
pivot_longer(-traits) %>%
mutate(
name = gsub("ME", "", name),
name = factor(name, levels = module_order))
textMatrixLong2 <- as.data.frame(textMatrixLong2)
## add gene counts per module
genesmod<- as.data.frame(moduleLabels)
genesmod$genes <- rownames(genesmod)
genesmod$Modules <- paste("SSU",genesmod$moduleLabels, sep="") #labels2colors(genesmod$moduleLabels)
ModCount <- as.data.frame(genesmod %>%
dplyr::group_by(Modules) %>%
dplyr::summarise(n = n()) )
ModCount <- ModCount[order(as.numeric(ModCount$n), decreasing = T),]
HM <- mME %>% ggplot(., aes(x=traits, y=name, fill=value)) +
geom_tile() +
scale_fill_gradient2(
low = "steelblue1",
high = "red",
mid = "white",
midpoint = 0,
limit = c(-0.7,0.7)) +
theme(axis.text.x = element_text(angle=90)) +
labs(title = paste("Module-trait Relationships", Datasetname, sep = " "), y = "Modules", fill="corr") +
geom_text(aes(label=textMatrixLong$value), position=position_nudge(y=0.2),
size=2.5, colour="grey20") +
geom_text(aes(label=paste0("(",textMatrixLong2$value,")")), position=position_nudge(y=-0.2),
colour="grey20", size=2.5) +
theme_minimal() + atheme + theme(axis.title.x = element_blank()) +
theme( panel.grid.major = element_blank(),panel.grid.minor = element_blank())
prow <- cowplot::plot_grid(HM, labels = c(""), ncol = 1)
ModuleHeatmap<- plot_grid(prow, ncol = 1, rel_heights = c(0.02, 0.05, 0.98))
ggsave(ModuleHeatmap, filename = paste(paste(save_name, "ModuleHeatmap", Datasetname, Date, sep="_"),
".png", sep=""), path = pathPlots, device='png', dpi=300, width = 8, height = 6)
WGCNA_list[[Datasetname]][["plots"]][["ModuleHeatmap"]] <- ModuleHeatmap
mME<- mME%>% mutate(Category = case_when((traits %in% c(
"NH4", "NO2", "NO3", "O2", "PO4", "TOC", "Temp", "SPM", "Salinity"
)) ~ "Abiotics",
(traits %in% c(
"HSI", "SSI", "GSI", "FCF", "Age", "Length", "FillLevel")) ~ "Physiology"))
Order<- c("Abiotics", "Physiology")
HM <- mME %>% ggplot(., aes(x=name, y=factor(traits, levels=traitData), fill=value)) +
geom_tile() +
scale_fill_gradientn(
colours = c("steelblue1", "white", "red"),
limit = c(-1,1),
values = scales::rescale(c(-1, -0.3, 0, 0.3, 1))) +
scale_x_discrete(limits=ModCount$Modules, labels=paste(ModCount$Modules, ": ", ModCount$n)) +
facet_grid(factor(mME$Category, levels=Order), scales = "free", space = "free") +
theme(axis.text.x = element_text(angle=90)) +
labs(title = paste("Module-trait Relationships", Datasetname, sep = " "), y = "Modules", fill="corr") +
geom_text(aes(label=textMatrixLong$value), position=position_nudge(y=0.2),
size=2.5, colour="grey20") +
geom_text(aes(label=paste0("(",textMatrixLong2$value,")")), position=position_nudge(y=-0.2),
colour="grey20", size=2.5) +
theme_minimal() + atheme + theme(axis.title.x = element_blank()) +
theme( panel.grid.major = element_blank(),panel.grid.minor = element_blank())
ModuleHeatmap2 <- cowplot::plot_grid(HM, labels = c(""), ncol = 1)
ModuleHeatmap2 <- plot_grid(ModuleHeatmap2, ncol = 1, rel_heights = c(0.02, 0.05, 0.98))
plot(ModuleHeatmap2)
ggsave(ModuleHeatmap2, filename = paste(paste(save_name, "ModuleHeatmap-2", Datasetname, Date, sep="_"),".png",
sep=""), path = pathPlots, device='png', dpi=300, width = 9, height = 6)
WGCNA_list[[Datasetname]][["plots"]][["ModuleHeatmap2"]] <- ModuleHeatmap2
##################
#Specific Modules#
##################
# Module <- "darkturquoise"
# HM <- mME[mME$name == Module,] %>% ggplot(., aes(x=name, y=factor(traits, levels=traitData), fill=value)) +
# geom_tile() +
# scale_fill_gradientn(
# colours = c("steelblue1", "white", "red"),
# limit = c(-1,1),
# values = scales::rescale(c(-1, -0.3, 0, 0.3, 1))) +
# # scale_fill_gradient2(
# # low = "steelblue1",
# # high = "red",
# # mid = "white",
# # midpoint = 0,
# # limit = c(-1,1))
# #facet_grid(factor(mME$Category, levels=Order), scales = "free", space = "free") +
# theme(axis.text.x = element_text(angle=90)) +
# labs(title = "", y = "Modules", fill="corr") +
# geom_text(aes(label=textMatrixLong[textMatrixLong$name == Module,]$value), position=position_nudge(y=0.2),
# size=3, colour="grey20") +
# geom_text(aes(label=paste0("(",textMatrixLong2[textMatrixLong2$name == Module,]$value,")")), position=position_nudge(y=-0.2),
# colour="grey20", size=3) +
# theme_minimal() + atheme + theme(axis.title.x = element_blank()) +
# theme( panel.grid.major = element_blank(),panel.grid.minor = element_blank())
# prow <- cowplot::plot_grid(HM, labels = c(""), ncol = 1)
# A<- plot_grid(prow, ncol = 1, rel_heights = c(0.02, 0.05, 0.98))
# plot(A)
# ggsave(A, filename = paste("WGCNA-ModuleHeatmap", Module, sep="_"), path = pathPlots,
# device='png', dpi=300, width = 2.8,height = 7)
}
for (Datasetname in names(WGCNA_list)) {
omics_data <- WGCNA_list[[Datasetname]]$omics_data
datTraits <- WGCNA_list[[Datasetname]]$datTraits
moduleLabels <- WGCNA_list[[Datasetname]]$moduleLabels
moduleColors <- WGCNA_list[[Datasetname]]$moduleColors
MEs <- WGCNA_list[[Datasetname]]$MEs
network <- WGCNA_list[[Datasetname]]$network
ps <- WGCNA_list[[Datasetname]]$ps
SSUWGCNAlist <- WGCNA_list[[Datasetname]]$SSUWGCNAlist
for (MODULE in names(SSUWGCNAlist)) {
tryCatch({
genes_of_interest <- names(omics_data[moduleLabels == paste(sub("SSU", "", MODULE))])
print(MODULE)
#print(head(names(ExpressionSet[moduleLabels == paste(sub("ME", "", MODULE))])))
FILENAME <- paste(paste(save_name, "Heatplot", MODULE, Datasetname, sep="_"),".png", sep="")
TITLE <- draw_label_themeRKwhite(paste(Datasetname, MODULE),
element = "plot.title", x = 0.05, hjust = 0, vjust = 1)
#For any unknown reason gene names like trnat-ugu_39 get changes by WGCNA to trnat.ugu_39
rowclusternum <- 1
columnclusternum <- 1
require(DESeq2)
require(tidyverse)
require(ggplot2)
BacteriaHeatPlotRKnoClust(clr = omics_data, Species = Datasetname, min_count = 10,
genes_of_interest = genes_of_interest, Samples = sample_names(ps),
SAMDF = sample_data(ps), TITLE = TITLE, filename= FILENAME,
OutlineColor = "grey20")
if (MODULE == "ME5") {
plot(A)
} #else {
#print("Plots are saved to the pathPlots")
#}
}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}
}
#-
ModsOfInterst_list <- list()
for (Datasetname in names(WGCNA_list)) {
omics_data <- WGCNA_list[[Datasetname]]$omics_data
datTraits <- WGCNA_list[[Datasetname]]$datTraits
moduleLabels <- WGCNA_list[[Datasetname]]$moduleLabels
moduleColors <- WGCNA_list[[Datasetname]]$moduleColors
MEs <- WGCNA_list[[Datasetname]]$MEs
network <- WGCNA_list[[Datasetname]]$network
ps <- WGCNA_list[[Datasetname]]$ps
traitData <- WGCNA_list[[Datasetname]]$traitData
ModsOfInterst_list[[ paste("ModsOfInterest", Datasetname, sep="_")]] <- list()
###############################################################################
#Create Dataframes of significant correlation between RNAModules and TraitData#
###############################################################################
p.value_matr <- corr.value_matr <- matrix(ncol = ncol(datTraits),
nrow = ncol(MEs),
dimnames = list(colnames(MEs),
colnames(datTraits )))
for(ii in 1:ncol(MEs)){
for(j in 1:ncol(datTraits)){
cor.res <- cor.test(MEs[,ii], datTraits[,j])
p.value_matr[ii, j] <- cor.res$p.value
corr.value_matr[ii, j] <- cor.res$estimate
}
}
# Correct for number of tests
p.value_matr.adjust <- p.adjust(p.value_matr, method = "fdr")
dim(p.value_matr.adjust) <- dim(p.value_matr)
dimnames(p.value_matr.adjust) <- list(colnames(MEs), colnames(datTraits))
# Collect all results into a list.
Traits_corr <- list(p_value =as.data.frame(p.value_matr),
p_value_adj = as.data.frame(p.value_matr.adjust),
correlation = as.data.frame(corr.value_matr))
ModsOfInterst <- list()
for (iii in names(Traits_corr$correlation)){
#aa <- length(ModsOfInterst)
# if (iii %in% c("NONE")) {
# A <- Traits_corr$correlation[iii][Traits_corr$correlation[iii] > abs(0.3), , drop=FALSE]
# B <- Traits_corr$p_value_adj[iii][Traits_corr$p_value_adj[iii] < 0.05, , drop = FALSE]
# BB <- B[rownames(B) %in% rownames(A), , drop = FALSE]
# BBB <- cbind(A[rownames(A) %in% rownames(BB), , drop = FALSE], BB)
# names(BBB) <- c("correlation", "p_value_adj")
# }
# else {
A <- Traits_corr$correlation[iii][abs(Traits_corr$correlation[iii]) > 0.3, , drop=FALSE]
B <- Traits_corr$p_value_adj[iii][Traits_corr$p_value_adj[iii] < 0.05, , drop = FALSE]
BB <- B[rownames(B) %in% rownames(A), , drop = FALSE]
BBB <- cbind(A[rownames(A) %in% rownames(BB), , drop = FALSE], BB)
names(BBB) <- c("correlation", "p_value_adj")
# }
ModsOfInterst[[iii]] <- BBB
}
ModsOfInterst <- Filter(function(df) nrow(df) > 0, ModsOfInterst)
ModsOfInterst_list[[ paste("ModsOfInterest", Datasetname, sep="_")]] <- ModsOfInterst
WGCNA_list[[Datasetname]][["ModsOfInterst_list"]] <- ModsOfInterst_list
}
#############
#Save as CSV#
#############
for (Datasetname in names(WGCNA_list)) {
data <- WGCNA_list[[Datasetname]]$ModsOfInterst_list[[paste("ModsOfInterest",
Datasetname, sep="_")]]
#data.frame(ID = rep(names(data), sapply(data, length)), Obs = unlist(data))
#data.table::rbindlist(data, idcol="df")
data <- data %>%
imap_dfr(~ .x %>%
mutate(TraitOfInterest = .y))
split_names <- str_split_fixed(rownames(data), "\\.", 2)
data <- cbind(
data,
Module = split_names[, 1])
rownames(data) <- NULL
data$MODULE <- sub("ME", "", data$Module)
data$Mod <- paste(Datasetname, data$MODULE, sep="")
#data <- data %>%
# arrange(MODULE)
write.csv2(data , file = paste0(file.path(path_Output_16S,
paste(paste("SSU_ModulesOfInterst", Datasetname, sep="_"),
"csv", sep="."))))
}
for (Datasetname in names(WGCNA_list)) {
omics_data <- WGCNA_list[[Datasetname]]$omics_data
datTraits <- WGCNA_list[[Datasetname]]$datTraits
moduleLabels <- WGCNA_list[[Datasetname]]$moduleLabels
moduleColors <- WGCNA_list[[Datasetname]]$moduleColors
MEs <- WGCNA_list[[Datasetname]]$MEs
network <- WGCNA_list[[Datasetname]]$network
ps <- WGCNA_list[[Datasetname]]$ps
ModsOfInterest <- WGCNA_list[[Datasetname]]$ModsOfInterst_list[[paste("ModsOfInterest",
Datasetname, sep="_")]]
data <- ModsOfInterest %>%
imap_dfr(~ .x %>%
mutate(TraitOfInterest = .y))
split_names <- str_split_fixed(rownames(data), "\\.", 2)
data <- cbind(
data,
Module = split_names[, 1])
rownames(data) <- NULL
data$MODULE <- sub("ME", "", data$Module)
data$Mod <- paste(Datasetname, data$MODULE, sep="")
#############################################
#Create Bacterial Counts and Reseq Dataframe#
#############################################
TAX <- as.data.frame(tax_table(ps))
OTU <- as.data.frame(t(otu_table(ps)))
REFSEQ <- refseq(ps)
REFSEQ <- stack(as.character(REFSEQ, use.names=TRUE))
colnames(REFSEQ)[colnames(REFSEQ) == "ind"] <- "ASV"
SeasonSums <- ps %>%
transform_sample_counts(function(x) {x/sum(x)*100}) %>%
phyloseq::otu_table() %>%
as.data.frame() %>%
rownames_to_column(var = "SampleID") %>%
dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
dplyr::group_by(Season) %>%
dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>%
t() %>%
as.data.frame() %>%
`colnames<-`(.[1, ]) %>%
.[-1, ] %>%
setNames(paste0("avg_", colnames(.))) %>%
mutate_all(as.numeric) %>%
rownames_to_column(var="ASV")
SSUData <- TAX %>%
rownames_to_column(var = "ASV") %>%
left_join( #Add relative ASVmeans
data.frame(t(phyloseq::otu_table(ps %>%
transform_sample_counts(function(x) {x/sum(x)*100})))) %>%
dplyr::mutate(ASVmeans = rowMeans(.)) %>%
dplyr::mutate(ASV = rownames(.)) %>%
dplyr::select(ASV, ASVmeans)
) %>%
left_join( #Add Sums by Season
SeasonSums
) %>%
left_join( #Add relative ASV data per sample
data.frame(t(phyloseq::otu_table(ps %>%
transform_sample_counts(function(x) {x/sum(x)*100})))) %>%
rownames_to_column(var = "ASV")
) %>%
left_join(REFSEQ
) %>%
dplyr::arrange(desc(ASVmeans))
rownames(SSUData) <- SSUData$ASV
write.csv2(SSUData, paste0(file.path(path_Output_16S, paste(paste(save_name,
paste("SSU_ALL", sep=""), Date, Datasetname, sep="_"),".csv", sep=""))))
############################################################################
#Create WGCNA Dataframe: Species, ModuleMembership, Correlation to Abiotics#
############################################################################
modNames <- names(MEs)
nSamples <- nrow(omics_data)
SSUWGCNAlist <- list()
for (i in names(MEs)) {
a <- length(SSUWGCNAlist)
ModuleOfInterst <- paste(sub("ME", "", i))
geneModuleMembership = as.data.frame(cor(omics_data, MEs, use = "p"));
MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples));
names(geneModuleMembership) = paste("MM", modNames, sep="");
names(MMPvalue) = paste("p.MM", modNames, sep="");
DatTraitOfInterest <- as.data.frame(datTraits[data[data$Module %in% i,]$TraitOfInterest])
GS_list <- list()
if (ncol(DatTraitOfInterest) > 0) {
for (TraitOfInterest in names(DatTraitOfInterest)) {
geneTraitSignificance =
as.data.frame(WGCNA::cor(omics_data, DatTraitOfInterest[TraitOfInterest], use = "p"));
GSPvalue = as.data.frame(WGCNA::corPvalueStudent(as.matrix(geneTraitSignificance), nSamples));
names(geneTraitSignificance) = paste("GS.", names(DatTraitOfInterest[TraitOfInterest]), sep="");
names(GSPvalue) = paste("p.GS.", names(DatTraitOfInterest[TraitOfInterest]), sep="");
GS_list[[TraitOfInterest]] <- geneTraitSignificance
}
align_rows <- function(lst) {
common_rows <- Reduce(intersect, lapply(lst, rownames))
aligned_list <- lapply(lst, function(df) df[common_rows, , drop=FALSE])
return(aligned_list)
}
aligned_GS_list <- align_rows(GS_list)
combined_df <- do.call(cbind, aligned_GS_list)
} else {
cat("'TraitOfInterest' does not exist in the names of DatTraitOfInterest.\n")
}
OmicsPerModule <- as.data.frame(names(moduleLabels[moduleLabels == ModuleOfInterst]))
names(OmicsPerModule) <- paste("ME", ModuleOfInterst, sep="")
rownames(OmicsPerModule) <- OmicsPerModule[,1]
SSUWGCNAdata <- OmicsPerModule %>% rownames_to_column("RowName") %>%
left_join(geneModuleMembership[rownames(OmicsPerModule),, drop = FALSE][names(geneModuleMembership) %in%
paste("MMME", ModuleOfInterst, sep="")] %>% rownames_to_column("RowName"), by = "RowName") %>%
column_to_rownames("RowName")
if (ncol(DatTraitOfInterest) > 0) {
SSUWGCNAdata1 <- SSUWGCNAdata %>%
rownames_to_column("RowName") %>%
left_join(combined_df %>%
rownames_to_column("RowName"))
SSUWGCNAdata1$ASV <- SSUWGCNAdata1$RowName
SSUWGCNAdata2 <- SSUWGCNAdata1 %>%
left_join(SSUData[SSUData$ASV %in% SSUWGCNAdata1$RowName,]) %>%
dplyr::arrange(desc(ASVmeans))
} else {
SSUWGCNAdata2 <- SSUWGCNAdata %>%
mutate(ASV = rownames(.)) %>%
left_join(SSUData[SSUData$ASV %in% rownames(SSUWGCNAdata),]) %>%
dplyr::arrange(desc(ASVmeans))
}
# SSUWGCNAdata <- SSUWGCNAdata %>% rownames_to_column("RowName") %>%
# left_join(geneTraitSignificance[rownames(OmicsPerModule),, drop = FALSE][names(geneTraitSignificance) %in%
# paste("GS.", TraitOfInterest, sep="")] %>% rownames_to_column("RowName"), by = "RowName") %>%
# column_to_rownames("RowName")
#
#Reoder by kME/Module Membership
#SSUWGCNAdata <- SSUWGCNAdata %>%
# dplyr::arrange(desc(SSUWGCNAdata[paste("MMME", ModuleOfInterst, sep="")]))
SSUWGCNAlist[[a+1]] <- SSUWGCNAdata2
names(SSUWGCNAlist)[[a+1]] <- paste("SSU",ModuleOfInterst, sep="")
write.csv2(SSUWGCNAdata2, paste0(file.path(path_Output_16S, paste(paste(save_name,
paste("SSU",ModuleOfInterst, sep=""), Date, Datasetname, sep="_"),".csv", sep=""))))
}
WGCNA_list[[Datasetname]][["SSUWGCNAlist"]] <- SSUWGCNAlist
}
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
#Export ASVs as Fasta
# pslist$ps_SL %>%
# refseq() %>%
# Biostrings::writeXStringSet(paste0(file.path(path_Output_16S, "SSUWGCNAlist_SL_21.08.23.asv.fna")), append=FALSE,
# compress=FALSE, compression_level=NA, format="fasta")
# fasta_sequences <- Biostrings::readDNAStringSet(paste0(file.path(path_Output_16S, "SSUWGCNAlist_SL_21.08.23.asv.fna")))
# target_names <- SSUWGCNAlist$SSU8$ASV
# # Subset the sequences based on the target names
# subset_sequences <- fasta_sequences[names(fasta_sequences) %in% target_names]
# Biostrings::writeXStringSet(subset_sequences, paste0(file.path(path_Output_16S, "SSUWGCNAlist_SL_21.08.23_SSU8.asv.fna")))
#######################################
#Extract and Inspect Sequences on NCBI#
#######################################
# minTotRelAbun = 0.001
# x = taxa_sums(pslist$ps_Mock)
# keepTaxa = which((x / sum(x)) > minTotRelAbun)
# prunedSet = prune_taxa(as.character(keepTaxa), pslist$ps_Mock)
#
# prunedSet %>%
# refseq() %>%
# Biostrings::writeXStringSet("~/asv.fna", append=FALSE,
# compress=FALSE, compression_level=NA, format="fasta")
saveRDS(WGCNA_list, file = paste0(file.path(path_Output_16S, paste(paste(save_name, "WGCNA_list", Date, sep="_"), ".rds", sep=""))))
hmps_list <- list()
for (Datasetname in names(WGCNA_list)[grepl("OE|GC", names(WGCNA_list))]) {
omics_data <- WGCNA_list[[Datasetname]]$omics_data
datTraits <- WGCNA_list[[Datasetname]]$datTraits
moduleLabels <- WGCNA_list[[Datasetname]]$moduleLabels
moduleColors <- WGCNA_list[[Datasetname]]$moduleColors
RNA_MEs <- WGCNA_list[[Datasetname]]$MEs
names(RNA_MEs) <- sub("ME", paste(Datasetname), names(RNA_MEs))
WF_MEs <- WGCNA_list[["WF"]]$MEs
names(WF_MEs) <- sub("ME", "WF", names(WF_MEs))
network <- WGCNA_list[[Datasetname]]$network
ps <- WGCNA_list[[Datasetname]]$ps
traitData <- WGCNA_list[[Datasetname]]$traitData
SAM_MEs <- data.frame(sample_data(WGCNA_list[[Datasetname]]$ps))
ASVsums <- as.data.frame(sapply(WGCNA_list[[Datasetname]]$SSUWGCNAlist, function(x) sum(x$ASVmeans)))
ASVsums<- ASVsums %>%
round(0) %>%
setNames("ASVsum") %>%
rownames_to_column(var = "Module") %>%
dplyr::arrange(desc(ASVsum))
ASVsums$Module <- sub("SSU", paste(Datasetname), ASVsums$Module)
# WGCNA_NormalHeatmap_RK(
# SAM_MEs = SAM_MEs,
# SSU_MEs = MEs,
# WIDTH = WIDTH,
# HEIGHT = HEIGHT,
# traitData = traitData)
# Normal_hmap
##############################################################
#Creating artificially expanded Bakterioplankton ME dataframe#
##############################################################
ps_WF <- pslist[[paste("ps", "WF", sep="_")]]
Names_Fish <- substring(sample_data(ps)$SampleID, first = 3)
NAME_WF <- substring(rownames(WF_MEs), first = 3)
NAME_WF <- gsub("\\d+$", "", NAME_WF)
NAME_WF <- substr(NAME_WF , 1, nchar(NAME_WF ) - 2)
Names_Fish_with_matching_WF <- Names_Fish[substr(gsub("\\d+$", "", Names_Fish), 1,
nchar(gsub("\\d+$", "", Names_Fish)) - 2) %in% NAME_WF]
print("These Fish samples have real matching Waterfilters from the same Catch")
print(Names_Fish_with_matching_WF)
print("These will be created from related catches")
print(setdiff(Names_Fish, Names_Fish_with_matching_WF))
# matching_WF <- sample_data(ps) |>
# mutate(Names_Fish = gsub("\\d+$", "", substring(SampleID, first = 3))) |>
# mutate(Names_Fish = substr(Names_Fish , 1, nchar(Names_Fish ) - 2))
# matching_WF <- matching_WF[matching_WF$Names_Fish %in% NAME_WF]
WF_ME_Expanded <- list()
for (i in seq_along(Names_Fish)) {
NAME <- Names_Fish[i]
NAME2 <- gsub("\\d+$", "", NAME)
NAME3 <- substr(NAME2 , 1, nchar(NAME2 ) - 2)
NAME4 <- substr(NAME3 , 1, nchar(NAME3 ) - 2)
if (paste("WF", NAME3, "EB", sep="") %in% rownames(WF_MEs)) {
WF_ME_Expanded[[i]] <- WF_MEs[rownames(WF_MEs) %in% paste("WF", NAME3, "EB", sep=""),]
} else if (paste("WF", NAME3, "FL", sep="") %in% rownames(WF_MEs)){
WF_ME_Expanded[[i]] <- WF_MEs[rownames(WF_MEs) %in% paste("WF", NAME3, "FL", sep=""),]
} else if (paste("WF", NAME4, "TWEB", sep="") %in% rownames(WF_MEs)){
WF_ME_Expanded[[i]] <- WF_MEs[rownames(WF_MEs) %in% paste("WF", NAME4, "TWEB", sep=""),]
} else if (paste("WF", NAME4, "MLFL", sep="") %in% rownames(WF_MEs)){
WF_ME_Expanded[[i]] <- WF_MEs[rownames(WF_MEs) %in% paste("WF", NAME4, "MLFL", sep=""),]
} else {print(NA)}
names(WF_ME_Expanded)[[i]] <- paste("WF", NAME, sep="")
}
WF_ME_Expanded_df <- do.call(rbind, WF_ME_Expanded)
##############################
#Creating Integraded heatmaps#
##############################
hmps_length <- length(hmps_list)
FILENAME <- paste(paste("IntegratedHeatmap", Datasetname, Date, sep="_"),".png", sep="")
WGCNA_IntegratedHeatmap_RK(
SAM_MEs = SAM_MEs,
RNA_MEs = RNA_MEs,#[names(RNA_MEs) != paste(Datasetname, "0", sep="")],
SSU_MEs = WF_ME_Expanded_df[names( WF_ME_Expanded_df) != "WF0"],
WIDTH = 16,
HEIGHT = 5,
OutlineColor = "grey20",
traitData = traitData)
hmps_list[[hmps_length+1]] <- ht_list
names(hmps_list)[[hmps_length+1]] <- paste("ht_list", Datasetname, sep="_")
prow <- cowplot::plot_grid(grid.grabExpr(ComplexHeatmap::draw(ht_list, auto_adjust = FALSE,
background = "transparent",
heatmap_legend_side = "left", annotation_legend_side = "bottom")))
title <- ggdraw()
subtitle <- ggdraw() + draw_label_themeRKwhite(paste("Heatmap MEs", sep = " "),
element = "plot.subtitle",x = 0.05, hjust = 0, vjust = 1)
b <- plot_grid(title, subtitle, prow, ncol = 1, rel_heights = c(0.04, 0.05, 0.98))
#Safe the plots in a specified folder <- may have to change the width and height
plot(prow)
}
## [1] "These Fish samples have real matching Waterfilters from the same Catch"
## [1] "SU21MGEB1" "SU21MGEB4" "SU21MGEB6" "SU21MGEB8" "SU21MGEB9" "SU21MGFL2"
## [7] "SU21MGFL6" "SU21BBEB5" "SU21BBEB6" "SU21BBEB7" "SU21BBEB8" "SU21BBEB9"
## [13] "SU21BBFL1" "SU21BBFL4" "SU21SSEB6" "SU21SSEB7" "SU21SSEB8" "SU21SSEB9"
## [19] "SU21SSFL1" "SU21SSFL2" "SU21SSFL6" "SU21MLEB1" "SU21MLEB2" "SU21MLEB3"
## [25] "SU21MLFL1" "SU21MLFL2" "AU21TWEB1" "AU21TWEB3" "AU21TWEB5" "AU21TWEB7"
## [31] "AU21TWFL1" "AU21TWFL3" "AU21TWFL7" "WI22BBEB1" "WI22BBEB4" "WI22BBEB5"
## [37] "WI22BBEB7" "WI22BBFL1" "WI22BBFL3" "WI22BBFL7" "WI22MGEB1" "WI22MGEB3"
## [43] "WI22MGEB5" "WI22MGEB9" "WI22MGFL1" "WI22MGFL3" "WI22MGFL7" "WI22SSEB1"
## [49] "WI22SSEB5" "WI22SSEB7" "WI22SSEB9" "WI22SSFL1" "WI22SSFL5" "WI22SSFL8"
## [55] "WI22MLEB1" "WI22MLEB3" "WI22MLEB7" "WI22MLEB9" "WI22MLFL1" "WI22MLFL5"
## [61] "WI22MLFL7" "SP22MGEB1" "SP22MGEB3" "SP22MGEB5" "SP22MGEB7" "SP22MGFL1"
## [67] "SP22MGFL3" "SP22MGFL5" "SP22BBEB1" "SP22BBEB3" "SP22BBEB7" "SP22BBEB8"
## [73] "SP22BBFL2" "SP22BBFL3" "SP22BBFL5" "SP22SSEB1" "SP22SSEB2" "SP22SSEB3"
## [79] "SP22SSEB4" "SP22SSFL1" "SP22SSFL3" "SP22SSFL5" "SP22MLEB1" "SP22MLEB2"
## [85] "SP22MLEB3" "SP22MLFL1" "SP22MLFL2" "SP22MLFL3" "SP22MLFL4"
## [1] "These will be created from related catches"
## [1] "SU21TWEB2" "SU21TWEB3" "SU21TWEB4" "SU21TWFL1" "SU21TWFL3" "SU21TWFL5"
## [7] "SU21TWFL7" "AU21MGEB1" "AU21MGEB3" "AU21MGEB5" "AU21MGEB7" "AU21MGFL1"
## [13] "AU21MGFL3" "AU21MGFL5" "AU21BBEB1" "AU21BBEB3" "AU21BBEB6" "AU21BBEB7"
## [19] "AU21BBFL2" "AU21BBFL4" "AU21BBFL6" "AU21SSEB1" "AU21SSEB3" "AU21SSEB5"
## [25] "AU21SSEB8" "AU21SSFL1" "AU21SSFL3" "AU21SSFL6" "AU21MLEB1" "AU21MLEB3"
## [31] "AU21MLEB5" "AU21MLEB7" "AU21MLFL1" "AU21MLFL3" "AU21MLFL7" "WI22TWEB1"
## [37] "WI22TWEB2" "WI22TWEB3" "WI22TWEB4" "WI22TWEB5" "WI22TWEB7" "WI22TWEB9"
## [43] "SP22TWEB1" "SP22TWEB2" "SP22TWEB5" "SP22TWFL1" "SP22TWFL3" "SP22TWFL5"
## [49] "SP22TWFL7"
## [1] "These Fish samples have real matching Waterfilters from the same Catch"
## [1] "SU21BBEB1" "SU21BBEB2" "SU21BBEB4" "SU21BBEB5" "SU21BBFL3" "SU21SSEB1"
## [7] "SU21SSEB2" "SU21SSEB3" "SU21SSEB4" "SU21SSEB8" "SU21SSFL1" "SU21SSFL3"
## [13] "SU21MLEB1" "SU21MLEB2" "SU21MLEB3" "SU21MLEB5" "SU21MLEB6" "SU21MLFL1"
## [19] "SU21MLFL2" "SU21MLFL3" "SU21MLFL6" "AU21TWEB1" "AU21TWFL1" "AU21TWFL2"
## [25] "WI22BBEB1" "WI22BBFL1" "WI22BBFL2" "WI22SSEB1" "WI22SSEB2" "WI22SSEB3"
## [31] "WI22SSFL1" "WI22SSFL2" "WI22MLEB1" "WI22MLFL1" "WI22MLFL2" "SP22BBEB1"
## [37] "SP22BBEB3" "SP22BBEB5" "SP22BBEB7" "SP22BBEB9" "SP22BBFL1" "SP22BBFL2"
## [43] "SP22SSEB1" "SP22SSEB3" "SP22SSEB5" "SP22SSEB7" "SP22SSFL1" "SP22SSFL3"
## [49] "SP22SSFL5" "SP22MLEB1" "SP22MLEB3" "SP22MLEB5" "SP22MLEB7" "SP22MLFL1"
## [55] "SP22MLFL3" "SP22MLFL5"
## [1] "These will be created from related catches"
## [1] "SU21TWEB1" "SU21TWEB2" "SU21TWEB3" "SU21TWEB4" "SU21TWFL1" "SU21TWFL2"
## [7] "SU21TWFL3" "AU21BBEB1" "AU21BBEB2" "AU21BBEB3" "AU21BBFL1" "AU21BBFL2"
## [13] "AU21BBFL3" "AU21BBFL4" "AU21SSEB1" "AU21SSEB3" "AU21SSEB5" "AU21SSEB7"
## [19] "AU21SSFL1" "AU21SSFL3" "AU21SSFL5" "AU21MLEB1" "AU21MLEB2" "AU21MLEB3"
## [25] "AU21MLFL1" "AU21MLFL3" "AU21MLFL5" "AU21MLFL7" "SP22TWEB1" "SP22TWFL1"
## [31] "SP22TWFL2" "SP22TWFL3"
for (Datasetname in names(WGCNA_list)[grepl("WF", names(WGCNA_list))]) {
omics_data <- WGCNA_list[[Datasetname]]$omics_data
datTraits <- WGCNA_list[[Datasetname]]$datTraits
moduleLabels <- WGCNA_list[[Datasetname]]$moduleLabels
moduleColors <- WGCNA_list[[Datasetname]]$moduleColors
RNA_MEs <- WGCNA_list[[Datasetname]]$MEs
names(RNA_MEs) <- sub("ME", "WF", names(RNA_MEs))
network <- WGCNA_list[[Datasetname]]$network
ps <- WGCNA_list[[Datasetname]]$ps
traitData <- WGCNA_list[[Datasetname]]$traitData
SAM_MEs <- data.frame(sample_data(WGCNA_list[[Datasetname]]$ps))
ASVsums <- as.data.frame(sapply(WGCNA_list[[Datasetname]]$SSUWGCNAlist, function(x) sum(x$ASVmeans)))
ASVsums<- ASVsums %>%
round(0) %>%
setNames("ASVsum") %>%
rownames_to_column(var = "Module") %>%
dplyr::arrange(desc(ASVsum))
ASVsums$Module <- sub("SSU", paste(Datasetname), ASVsums$Module)
##############################
#Creating Integraded heatmaps#
##############################
hmps_length <- length(hmps_list)
FILENAME <- paste(paste("IntegratedHeatmap", Datasetname, Date, sep="_"),".png", sep="")
WGCNA_IntegratedHeatmap_Bakterioplankton_RK (
SAM_MEs = SAM_MEs,
RNA_MEs = RNA_MEs,#[names(RNA_MEs) != "WF0"],
WIDTH = 10,
HEIGHT = 5,
OutlineColor = "grey20",
traitData = traitData)
hmps_list[[hmps_length+1]] <- ht_list
names(hmps_list)[[hmps_length+1]] <- paste("ht_list", Datasetname, sep="_")
prow <- cowplot::plot_grid(grid.grabExpr(ComplexHeatmap::draw(ht_list, auto_adjust = FALSE,
background = "transparent",
heatmap_legend_side = "left", annotation_legend_side = "bottom")))
title <- ggdraw()
subtitle <- ggdraw() + draw_label_themeRKwhite(paste("Heatmap MEs", sep = " "),
element = "plot.subtitle",x = 0.05, hjust = 0, vjust = 1)
b <- plot_grid(title, subtitle, prow, ncol = 1, rel_heights = c(0.04, 0.05, 0.98))
#Safe the plots in a specified folder <- may have to change the width and height
plot(prow)
}
ASVsums <- as.data.frame(sapply(WGCNA_list[[Datasetname]]$SSUWGCNAlist, function(x) sum(x$ASVmeans)))
ASVsums<- ASVsums %>%
round(0) %>%
setNames("ASVsum") %>%
rownames_to_column(var = "Module") %>%
dplyr::arrange(desc(ASVsum))
ASVsums$Module <- sub("SSU", paste(Datasetname), ASVsums$Module)
head(WGCNA_list[[Datasetname]]$SSUWGCNAlist$SSU2$ASVmeans, 5)
## ASV883:Limnohabitans.curvus ASV32:Polynucleobacter
## 0.3236509 0.2902560
## ASV1181:Polaromonas ASV814:Limnohabitans
## 0.2432417 0.1937816
## ASV1318:Limnohabitans
## 0.1697100
Module_Top5_Barplot_list <- list()
bars_list <- list()
for (Datasetname in names(WGCNA_list)[grepl("OE|GC|WF", names(WGCNA_list))]) {
#Datasetname <- sub("ps_","", Dataset)
ps <- pslist[[paste("ps", Datasetname, sep="_")]]
Module_Top5_Barplot_list[[Datasetname]] <- list()
bars_list[[Datasetname]] <- list()
for (Interest_ME in names(WGCNA_list[[Datasetname]]$SSUWGCNAlist)) {
################################################
Interest_ME_dat <- WGCNA_list[[Datasetname]]$SSUWGCNAlist[[paste(Interest_ME,
sep="")]]
print(paste("rel ASVmeans in Module", Interest_ME))
data <- head(
Interest_ME_dat %>%
dplyr::mutate(total_sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
dplyr::group_by(ASV) %>%
dplyr::mutate(sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
dplyr::summarize(rel_ASVmeans = round(sum(sum_ASVmeans) / sum(total_sum_ASVmeans) * 100,
digits=2))%>%
dplyr::arrange(desc(rel_ASVmeans)) %>%
as.data.frame() %>%
left_join(Interest_ME_dat[c("Phylum", "Class", "Order", "Family", "Genus", "Species", "ASV")])
, 5)
print(head(data, 2))
data$ASVcut <- sub(".*:", "", data$ASV)
result <- data %>%
dplyr::group_by(ASVcut) %>%
dplyr::summarise(rel_ASVmeans = sum(rel_ASVmeans, na.rm = TRUE),
Phylum = dplyr::first(Phylum),
Class = dplyr::first(Class),
Order = dplyr::first(Order),
Family = dplyr::first(Family),
Genus = dplyr::first(Genus),
Species = dplyr::first(Species))
p <- ggplot(result, aes(x = ASVcut, y = rel_ASVmeans, fill = Phylum)) +
geom_bar(stat = "identity", alpha = 0.7) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Relative ASV Means", x = "ASV", y = "Relative ASV Means") +
scale_fill_manual(values = phylum_colors_Cytoscape) +
#scale_fill_brewer(palette = "Set3") +
# annotate("text", x = x_pos, y = y_pos, label = paste(sub("SSU", Datasetname, Interest_ME)), hjust = 0, vjust = 1.5, size = 8, color = "black", fontface = "bold") +
# annotate("text", x = Inf, y = Inf, label = paste(sub("SSU", Datasetname, Interest_ME)), hjust = -0.1, vjust = 1.5) +
coord_flip() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 1))
y_start <- ifelse(range(result$rel_ASVmeans) >= 0 & range(result$rel_ASVmeans) <= 5, 2,
ifelse(range(result$rel_ASVmeans) > 5 & range(result$rel_ASVmeans) <= 10, 4.5,
ifelse(range(result$rel_ASVmeans) > 10 & range(result$rel_ASVmeans) <= 20, 7,
ifelse(range(result$rel_ASVmeans) > 20 & range(result$rel_ASVmeans) <= 50, 18,
ifelse(range(result$rel_ASVmeans) > 50 & range(result$rel_ASVmeans) <= 100, 30,
NA)))))
# Add y-axis labels as annotations starting at the calculated position
p <- p + geom_text(aes(label = ASVcut, y = y_start[2]), vjust = 1, size = 3.8, fontface='bold') +
#p <- p + geom_text(aes(label = ASVcut, y = 5), vjust = 1, size = 4) +
#geom_text(aes(label = ASVcut), vjust = -0.5, size = 3) +
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank()) +
theme(strip.text.y = element_text(angle = 0)) +
theme(
panel.background = element_rect(fill='transparent'), #transparent panel bg
plot.background = element_rect(fill='transparent', color=NA), #transparent plot bg
#legend.background = element_rect(fill='transparent'), #transparent legend bg
#legend.box.background = element_rect(fill='transparent') #transparent legend panel
) +
#theme(#axis.title.x.bottom = element_text(color="grey13"),
# strip.text = element_text(color = "black", face= "bold")) +
theme(
#legend.title=element_text(size=12),
#legend.text=element_text(size=12),
axis.text.x.bottom = element_text(size= 12, face = "bold", angle = 45, hjust = 1))+
#strip.text.y.left = element_text(size = 12,face = "bold"),
#axis.title.y.left = element_text(size=12,face = "bold")) +
theme(legend.position = "none") +
#ylab("ASV mean [%]") +
labs(title = "", x = "", y = "") #module ASV mean [%]
prow <- cowplot::plot_grid(p, labels = c(""), ncol = 1)
Module_Top5_Barplot_list[[Datasetname]][[Interest_ME]] <- prow
bars_list[[Datasetname]][[Interest_ME]] <- data.frame(SSU = Interest_ME, bars = dim(result)[1])
}
}
## [1] "rel ASVmeans in Module SSU8"
## ASV rel_ASVmeans Phylum Class
## 1 ASV113:Elizabethkingia 29.03 Bacteroidota Bacteroidia
## 2 ASV351:Citrobacter 8.84 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Flavobacteriales Weeksellaceae Elizabethkingia <NA>
## 2 Enterobacterales Enterobacteriaceae Citrobacter <NA>
## [1] "rel ASVmeans in Module SSU13"
## ASV rel_ASVmeans Phylum Class Order
## 1 ASV525:Elizabethkingia 33.45 Bacteroidota Bacteroidia Flavobacteriales
## 2 ASV687:Elizabethkingia 21.91 Bacteroidota Bacteroidia Flavobacteriales
## Family Genus Species
## 1 Weeksellaceae Elizabethkingia <NA>
## 2 Weeksellaceae Elizabethkingia <NA>
## [1] "rel ASVmeans in Module SSU6"
## ASV rel_ASVmeans Phylum Class
## 1 ASV1:Elizabethkingia 40.94 Bacteroidota Bacteroidia
## 2 ASV2:Citrobacter 7.90 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Flavobacteriales Weeksellaceae Elizabethkingia <NA>
## 2 Enterobacterales Enterobacteriaceae Citrobacter <NA>
## [1] "rel ASVmeans in Module SSU12"
## ASV rel_ASVmeans Phylum Class
## 1 ASV79:Citrobacter 22.01 Proteobacteria Gammaproteobacteria
## 2 ASV85:Enterobacteriaceae 20.40 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Enterobacterales Enterobacteriaceae Citrobacter <NA>
## 2 Enterobacterales Enterobacteriaceae <NA> <NA>
## [1] "rel ASVmeans in Module SSU3"
## ASV rel_ASVmeans Phylum Class
## 1 ASV43:Persicirhabdus 13.44 Verrucomicrobiota Verrucomicrobiae
## 2 ASV93:Persicirhabdus 7.30 Verrucomicrobiota Verrucomicrobiae
## Order Family Genus Species
## 1 Verrucomicrobiales Rubritaleaceae Persicirhabdus <NA>
## 2 Verrucomicrobiales Rubritaleaceae Persicirhabdus <NA>
## [1] "rel ASVmeans in Module SSU10"
## ASV rel_ASVmeans Phylum
## 1 ASV18:Clostridium sensu stricto 1 35.91 Firmicutes
## 2 ASV62:Photobacterium 16.43 Proteobacteria
## Class Order Family
## 1 Clostridia Clostridiales Clostridiaceae
## 2 Gammaproteobacteria Enterobacterales Vibrionaceae
## Genus Species
## 1 Clostridium sensu stricto 1 <NA>
## 2 Photobacterium <NA>
## [1] "rel ASVmeans in Module SSU2"
## ASV rel_ASVmeans Phylum Class
## 1 ASV65:Pseudomonas 13.70 Proteobacteria Gammaproteobacteria
## 2 ASV110:Elizabethkingia 6.79 Bacteroidota Bacteroidia
## Order Family Genus Species
## 1 Pseudomonadales Pseudomonadaceae Pseudomonas <NA>
## 2 Flavobacteriales Weeksellaceae Elizabethkingia <NA>
## [1] "rel ASVmeans in Module SSU11"
## ASV rel_ASVmeans Phylum Class
## 1 ASV245:Luteolibacter 29.44 Verrucomicrobiota Verrucomicrobiae
## 2 ASV406:Unknown Family 19.56 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Verrucomicrobiales Rubritaleaceae Luteolibacter <NA>
## 2 Gammaproteobacteria Incertae Sedis Unknown Family <NA> <NA>
## [1] "rel ASVmeans in Module SSU5"
## ASV rel_ASVmeans Phylum Class
## 1 ASV36:Alkanindiges 9.89 Proteobacteria Gammaproteobacteria
## 2 ASV27:Psychrobacter 9.39 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Pseudomonadales Moraxellaceae Alkanindiges <NA>
## 2 Pseudomonadales Moraxellaceae Psychrobacter <NA>
## [1] "rel ASVmeans in Module SSU7"
## ASV rel_ASVmeans Phylum Class
## 1 ASV13:Acinetobacter.lwoffii 28.59 Proteobacteria Gammaproteobacteria
## 2 ASV24:Acinetobacter 12.42 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Pseudomonadales Moraxellaceae Acinetobacter lwoffii
## 2 Pseudomonadales Moraxellaceae Acinetobacter <NA>
## [1] "rel ASVmeans in Module SSU4"
## ASV rel_ASVmeans Phylum Class
## 1 ASV145:Hyphomicrobium 4.08 Proteobacteria Alphaproteobacteria
## 2 ASV150:Xanthobacteraceae 3.45 Proteobacteria Alphaproteobacteria
## Order Family Genus Species
## 1 Rhizobiales Hyphomicrobiaceae Hyphomicrobium <NA>
## 2 Rhizobiales Xanthobacteraceae <NA> <NA>
## [1] "rel ASVmeans in Module SSU1"
## ASV rel_ASVmeans Phylum Class Order
## 1 ASV77:Weeksellaceae 4.84 Bacteroidota Bacteroidia Flavobacteriales
## 2 ASV90:Chryseobacterium 4.72 Bacteroidota Bacteroidia Flavobacteriales
## Family Genus Species
## 1 Weeksellaceae <NA> <NA>
## 2 Weeksellaceae Chryseobacterium <NA>
## [1] "rel ASVmeans in Module SSU9"
## ASV rel_ASVmeans Phylum Class
## 1 ASV12:Arthrobacter 33.51 Actinobacteriota Actinobacteria
## 2 ASV29:Psychrobacter 18.36 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Micrococcales Micrococcaceae Arthrobacter <NA>
## 2 Pseudomonadales Moraxellaceae Psychrobacter <NA>
## [1] "rel ASVmeans in Module SSU0"
## ASV rel_ASVmeans Phylum Class
## 1 ASV11:Luteolibacter 8.39 Verrucomicrobiota Verrucomicrobiae
## 2 ASV16:Asinibacterium 4.45 Bacteroidota Bacteroidia
## Order Family Genus Species
## 1 Verrucomicrobiales Rubritaleaceae Luteolibacter <NA>
## 2 Chitinophagales Chitinophagaceae Asinibacterium <NA>
## [1] "rel ASVmeans in Module SSU3"
## ASV rel_ASVmeans Phylum Class
## 1 ASV176:SC-I-84 3.96 Proteobacteria Gammaproteobacteria
## 2 ASV150:Xanthobacteraceae 3.81 Proteobacteria Alphaproteobacteria
## Order Family Genus Species
## 1 Burkholderiales SC-I-84 <NA> <NA>
## 2 Rhizobiales Xanthobacteraceae <NA> <NA>
## [1] "rel ASVmeans in Module SSU5"
## ASV rel_ASVmeans Phylum Class Order
## 1 ASV31:Chryseobacterium 6.19 Bacteroidota Bacteroidia Flavobacteriales
## 2 ASV48:Deinococcus 5.72 Deinococcota Deinococci Deinococcales
## Family Genus Species
## 1 Weeksellaceae Chryseobacterium <NA>
## 2 Deinococcaceae Deinococcus <NA>
## [1] "rel ASVmeans in Module SSU9"
## ASV rel_ASVmeans Phylum Class
## 1 ASV291:Altererythrobacter 6.86 Proteobacteria Alphaproteobacteria
## 2 ASV478:Sphingomonadaceae 5.03 Proteobacteria Alphaproteobacteria
## Order Family Genus Species
## 1 Sphingomonadales Sphingomonadaceae Altererythrobacter <NA>
## 2 Sphingomonadales Sphingomonadaceae <NA> <NA>
## [1] "rel ASVmeans in Module SSU6"
## ASV rel_ASVmeans Phylum Class
## 1 ASV162:Sphingomonas 6.24 Proteobacteria Alphaproteobacteria
## 2 ASV178:Deinococcus 5.60 Deinococcota Deinococci
## Order Family Genus Species
## 1 Sphingomonadales Sphingomonadaceae Sphingomonas <NA>
## 2 Deinococcales Deinococcaceae Deinococcus <NA>
## [1] "rel ASVmeans in Module SSU10"
## ASV rel_ASVmeans Phylum Class
## 1 ASV39:Dinghuibacter 17.33 Bacteroidota Bacteroidia
## 2 ASV82:Caulobacteraceae 14.20 Proteobacteria Alphaproteobacteria
## Order Family Genus Species
## 1 Chitinophagales Chitinophagaceae Dinghuibacter <NA>
## 2 Caulobacterales Caulobacteraceae <NA> <NA>
## [1] "rel ASVmeans in Module SSU7"
## ASV rel_ASVmeans Phylum Class
## 1 ASV238:Halioglobus 5.24 Proteobacteria Gammaproteobacteria
## 2 ASV289:Ilumatobacter 4.51 Actinobacteriota Acidimicrobiia
## Order Family Genus Species
## 1 Pseudomonadales Halieaceae Halioglobus <NA>
## 2 Microtrichales Ilumatobacteraceae Ilumatobacter <NA>
## [1] "rel ASVmeans in Module SSU8"
## ASV rel_ASVmeans Phylum Class
## 1 ASV1:Elizabethkingia 37.29 Bacteroidota Bacteroidia
## 2 ASV2:Citrobacter 8.70 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Flavobacteriales Weeksellaceae Elizabethkingia <NA>
## 2 Enterobacterales Enterobacteriaceae Citrobacter <NA>
## [1] "rel ASVmeans in Module SSU16"
## ASV rel_ASVmeans Phylum Class
## 1 ASV971:Citrobacter 17.33 Proteobacteria Gammaproteobacteria
## 2 ASV968:Enterobacteriaceae 17.08 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Enterobacterales Enterobacteriaceae Citrobacter <NA>
## 2 Enterobacterales Enterobacteriaceae <NA> <NA>
## [1] "rel ASVmeans in Module SSU4"
## ASV rel_ASVmeans Phylum Class
## 1 ASV12:Arthrobacter 21.04 Actinobacteriota Actinobacteria
## 2 ASV34:Psychrobacter 7.36 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Micrococcales Micrococcaceae Arthrobacter <NA>
## 2 Pseudomonadales Moraxellaceae Psychrobacter <NA>
## [1] "rel ASVmeans in Module SSU14"
## ASV rel_ASVmeans Phylum Class
## 1 ASV79:Citrobacter 20.61 Proteobacteria Gammaproteobacteria
## 2 ASV85:Enterobacteriaceae 20.32 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Enterobacterales Enterobacteriaceae Citrobacter <NA>
## 2 Enterobacterales Enterobacteriaceae <NA> <NA>
## [1] "rel ASVmeans in Module SSU13"
## ASV rel_ASVmeans Phylum Class
## 1 ASV1627:Chryseobacterium 21.29 Bacteroidota Bacteroidia
## 2 ASV2217:Deinococcus 19.17 Deinococcota Deinococci
## Order Family Genus Species
## 1 Flavobacteriales Weeksellaceae Chryseobacterium <NA>
## 2 Deinococcales Deinococcaceae Deinococcus <NA>
## [1] "rel ASVmeans in Module SSU2"
## ASV rel_ASVmeans Phylum Class
## 1 ASV13:Acinetobacter.lwoffii 20.16 Proteobacteria Gammaproteobacteria
## 2 ASV24:Acinetobacter 9.96 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Pseudomonadales Moraxellaceae Acinetobacter lwoffii
## 2 Pseudomonadales Moraxellaceae Acinetobacter <NA>
## [1] "rel ASVmeans in Module SSU15"
## ASV rel_ASVmeans Phylum Class
## 1 ASV1245:Pseudomonas 15.65 Proteobacteria Gammaproteobacteria
## 2 ASV822:Microcystis PCC-7914 14.48 Cyanobacteria Cyanobacteriia
## Order Family Genus Species
## 1 Pseudomonadales Pseudomonadaceae Pseudomonas <NA>
## 2 Cyanobacteriales Microcystaceae Microcystis PCC-7914 <NA>
## [1] "rel ASVmeans in Module SSU11"
## ASV rel_ASVmeans Phylum Class
## 1 ASV15:Verticiella 28.09 Proteobacteria Gammaproteobacteria
## 2 ASV32:Polynucleobacter 12.65 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Burkholderiales Alcaligenaceae Verticiella <NA>
## 2 Burkholderiales Burkholderiaceae Polynucleobacter <NA>
## [1] "rel ASVmeans in Module SSU1"
## ASV rel_ASVmeans Phylum Class
## 1 ASV28:Clostridium sensu stricto 1 20.72 Firmicutes Clostridia
## 2 ASV60:Clostridium sensu stricto 1 5.10 Firmicutes Clostridia
## Order Family Genus Species
## 1 Clostridiales Clostridiaceae Clostridium sensu stricto 1 <NA>
## 2 Clostridiales Clostridiaceae Clostridium sensu stricto 1 <NA>
## [1] "rel ASVmeans in Module SSU12"
## ASV rel_ASVmeans Phylum Class
## 1 ASV67:Pseudomonas 34.14 Proteobacteria Gammaproteobacteria
## 2 ASV224:Elizabethkingia 30.77 Bacteroidota Bacteroidia
## Order Family Genus Species
## 1 Pseudomonadales Pseudomonadaceae Pseudomonas <NA>
## 2 Flavobacteriales Weeksellaceae Elizabethkingia <NA>
## [1] "rel ASVmeans in Module SSU0"
## ASV rel_ASVmeans Phylum
## 1 ASV18:Clostridium sensu stricto 1 5.30 Firmicutes
## 2 ASV66:Rhodobacteraceae 5.02 Proteobacteria
## Class Order Family
## 1 Clostridia Clostridiales Clostridiaceae
## 2 Alphaproteobacteria Rhodobacterales Rhodobacteraceae
## Genus Species
## 1 Clostridium sensu stricto 1 <NA>
## 2 <NA> <NA>
## [1] "rel ASVmeans in Module SSU9"
## ASV rel_ASVmeans Phylum Class
## 1 ASV357:hgcI clade 27.97 Actinobacteriota Actinobacteria
## 2 ASV115:Luteolibacter 23.29 Verrucomicrobiota Verrucomicrobiae
## Order Family Genus Species
## 1 Frankiales Sporichthyaceae hgcI clade <NA>
## 2 Verrucomicrobiales Rubritaleaceae Luteolibacter <NA>
## [1] "rel ASVmeans in Module SSU17"
## ASV rel_ASVmeans Phylum Class
## 1 ASV4933:Ferruginibacter 14.82 Bacteroidota Bacteroidia
## 2 ASV5485:Polynucleobacter 12.33 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Chitinophagales Chitinophagaceae Ferruginibacter <NA>
## 2 Burkholderiales Burkholderiaceae Polynucleobacter <NA>
## [1] "rel ASVmeans in Module SSU4"
## ASV rel_ASVmeans Phylum Class
## 1 ASV94:Sporichthyaceae 17.97 Actinobacteriota Actinobacteria
## 2 ASV72:CL500-29 marine group 12.38 Actinobacteriota Acidimicrobiia
## Order Family Genus Species
## 1 Frankiales Sporichthyaceae <NA> <NA>
## 2 Microtrichales Ilumatobacteraceae CL500-29 marine group <NA>
## [1] "rel ASVmeans in Module SSU16"
## ASV rel_ASVmeans Phylum
## 1 ASV549:Pseudohongiella 20.94 Proteobacteria
## 2 ASV480:Candidatus Symbiobacter 17.38 Proteobacteria
## Class Order Family
## 1 Gammaproteobacteria Pseudomonadales Pseudohongiellaceae
## 2 Gammaproteobacteria Burkholderiales Comamonadaceae
## Genus Species
## 1 Pseudohongiella <NA>
## 2 Candidatus Symbiobacter <NA>
## [1] "rel ASVmeans in Module SSU23"
## ASV rel_ASVmeans Phylum Class
## 1 ASV5775:Candidatus Planktophila 23.02 Actinobacteriota Actinobacteria
## 2 ASV6222:Candidatus Aquirestis 19.68 Bacteroidota Bacteroidia
## Order Family Genus Species
## 1 Frankiales Sporichthyaceae Candidatus Planktophila <NA>
## 2 Chitinophagales Saprospiraceae Candidatus Aquirestis <NA>
## [1] "rel ASVmeans in Module SSU15"
## ASV rel_ASVmeans Phylum Class
## 1 ASV8906:Flavobacterium 10.14 Bacteroidota Bacteroidia
## 2 ASV5933:Gemmatimonas 7.70 Gemmatimonadota Gemmatimonadetes
## Order Family Genus Species
## 1 Flavobacteriales Flavobacteriaceae Flavobacterium <NA>
## 2 Gemmatimonadales Gemmatimonadaceae Gemmatimonas <NA>
## [1] "rel ASVmeans in Module SSU21"
## ASV rel_ASVmeans Phylum Class
## 1 ASV6010:Cytophaga 35.38 Bacteroidota Bacteroidia
## 2 ASV8864:Beijerinckia 16.28 Proteobacteria Alphaproteobacteria
## Order Family Genus Species
## 1 Cytophagales Cytophagaceae Cytophaga <NA>
## 2 Rhizobiales Beijerinckiaceae Beijerinckia <NA>
## [1] "rel ASVmeans in Module SSU2"
## ASV rel_ASVmeans Phylum Class
## 1 ASV883:Limnohabitans.curvus 3.84 Proteobacteria Gammaproteobacteria
## 2 ASV32:Polynucleobacter 3.44 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Burkholderiales Comamonadaceae Limnohabitans curvus
## 2 Burkholderiales Burkholderiaceae Polynucleobacter <NA>
## [1] "rel ASVmeans in Module SSU10"
## ASV rel_ASVmeans Phylum
## 1 ASV751:Flavobacterium.succinicans 11.04 Bacteroidota
## 2 ASV312:Tabrizicola 6.57 Proteobacteria
## Class Order Family Genus
## 1 Bacteroidia Flavobacteriales Flavobacteriaceae Flavobacterium
## 2 Alphaproteobacteria Rhodobacterales Rhodobacteraceae Tabrizicola
## Species
## 1 succinicans
## 2 <NA>
## [1] "rel ASVmeans in Module SSU5"
## ASV rel_ASVmeans Phylum Class
## 1 ASV565:Limnohabitans 12.48 Proteobacteria Gammaproteobacteria
## 2 ASV1161:Methylomonadaceae 4.98 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Burkholderiales Comamonadaceae Limnohabitans <NA>
## 2 Methylococcales Methylomonadaceae <NA> <NA>
## [1] "rel ASVmeans in Module SSU12"
## ASV rel_ASVmeans Phylum
## 1 ASV183:Polynucleobacter.difficilis 16.00 Proteobacteria
## 2 ASV745:Pseudarcicella 8.93 Bacteroidota
## Class Order Family Genus
## 1 Gammaproteobacteria Burkholderiales Burkholderiaceae Polynucleobacter
## 2 Bacteroidia Cytophagales Spirosomaceae Pseudarcicella
## Species
## 1 difficilis
## 2 <NA>
## [1] "rel ASVmeans in Module SSU18"
## ASV rel_ASVmeans Phylum Class
## 1 ASV1189:Sulfurifustis 26.90 Proteobacteria Gammaproteobacteria
## 2 ASV562:Woeseia 13.53 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Acidiferrobacterales Acidiferrobacteraceae Sulfurifustis <NA>
## 2 Steroidobacterales Woeseiaceae Woeseia <NA>
## [1] "rel ASVmeans in Module SSU3"
## ASV rel_ASVmeans Phylum Class
## 1 ASV121:Nitrospira 9.26 Nitrospirota Nitrospiria
## 2 ASV309:Limnohabitans 6.61 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Nitrospirales Nitrospiraceae Nitrospira <NA>
## 2 Burkholderiales Comamonadaceae Limnohabitans <NA>
## [1] "rel ASVmeans in Module SSU13"
## ASV rel_ASVmeans Phylum
## 1 ASV13:Acinetobacter.lwoffii 24.03 Proteobacteria
## 2 ASV1676:Roseimicrobium 4.27 Verrucomicrobiota
## Class Order Family Genus
## 1 Gammaproteobacteria Pseudomonadales Moraxellaceae Acinetobacter
## 2 Verrucomicrobiae Verrucomicrobiales Verrucomicrobiaceae Roseimicrobium
## Species
## 1 lwoffii
## 2 <NA>
## [1] "rel ASVmeans in Module SSU14"
## ASV rel_ASVmeans Phylum
## 1 ASV3310:Rhizobiales Incertae Sedis 9.23 Proteobacteria
## 2 ASV9551:Pseudohongiella 7.14 Proteobacteria
## Class Order Family
## 1 Alphaproteobacteria Rhizobiales Rhizobiales Incertae Sedis
## 2 Gammaproteobacteria Pseudomonadales Pseudohongiellaceae
## Genus Species
## 1 <NA> <NA>
## 2 Pseudohongiella <NA>
## [1] "rel ASVmeans in Module SSU19"
## ASV rel_ASVmeans Phylum
## 1 ASV3352:Steroidobacteraceae 17.90 Proteobacteria
## 2 ASV2884:Pedosphaeraceae 9.89 Verrucomicrobiota
## Class Order Family Genus Species
## 1 Gammaproteobacteria Steroidobacterales Steroidobacteraceae <NA> <NA>
## 2 Verrucomicrobiae Pedosphaerales Pedosphaeraceae <NA> <NA>
## [1] "rel ASVmeans in Module SSU7"
## ASV rel_ASVmeans Phylum Class
## 1 ASV804:Candidatus Aquirestis 8.24 Bacteroidota Bacteroidia
## 2 ASV114:Mycobacterium 7.32 Actinobacteriota Actinobacteria
## Order Family Genus Species
## 1 Chitinophagales Saprospiraceae Candidatus Aquirestis <NA>
## 2 Corynebacteriales Mycobacteriaceae Mycobacterium <NA>
## [1] "rel ASVmeans in Module SSU22"
## ASV rel_ASVmeans Phylum Class
## 1 ASV813:hgcI clade 63.56 Actinobacteriota Actinobacteria
## 2 ASV904:Anaerolineaceae 9.60 Chloroflexi Anaerolineae
## Order Family Genus Species
## 1 Frankiales Sporichthyaceae hgcI clade <NA>
## 2 Anaerolineales Anaerolineaceae <NA> <NA>
## [1] "rel ASVmeans in Module SSU11"
## ASV rel_ASVmeans Phylum Class
## 1 ASV4225:Sporichthyaceae 10.82 Actinobacteriota Actinobacteria
## 2 ASV7295:Pseudarcicella 4.85 Bacteroidota Bacteroidia
## Order Family Genus Species
## 1 Frankiales Sporichthyaceae <NA> <NA>
## 2 Cytophagales Spirosomaceae Pseudarcicella <NA>
## [1] "rel ASVmeans in Module SSU8"
## ASV rel_ASVmeans Phylum Class
## 1 ASV824:Woeseia 9.10 Proteobacteria Gammaproteobacteria
## 2 ASV186:OM60(NOR5) clade 8.55 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Steroidobacterales Woeseiaceae Woeseia <NA>
## 2 Pseudomonadales Halieaceae OM60(NOR5) clade <NA>
## [1] "rel ASVmeans in Module SSU20"
## ASV rel_ASVmeans Phylum Class Order
## 1 ASV651:Actibacter 14.94 Bacteroidota Bacteroidia Flavobacteriales
## 2 ASV718:Robiginitalea 14.25 Bacteroidota Bacteroidia Flavobacteriales
## Family Genus Species
## 1 Flavobacteriaceae Actibacter <NA>
## 2 Flavobacteriaceae Robiginitalea <NA>
## [1] "rel ASVmeans in Module SSU6"
## ASV rel_ASVmeans Phylum Class
## 1 ASV43:Persicirhabdus 13.93 Verrucomicrobiota Verrucomicrobiae
## 2 ASV466:Ilumatobacter 5.11 Actinobacteriota Acidimicrobiia
## Order Family Genus Species
## 1 Verrucomicrobiales Rubritaleaceae Persicirhabdus <NA>
## 2 Microtrichales Ilumatobacteraceae Ilumatobacter <NA>
## [1] "rel ASVmeans in Module SSU1"
## ASV rel_ASVmeans Phylum Class
## 1 ASV313:NS11-12 marine group 7.37 Bacteroidota Bacteroidia
## 2 ASV380:Cryomorphaceae 6.24 Bacteroidota Bacteroidia
## Order Family Genus Species
## 1 Sphingobacteriales NS11-12 marine group <NA> <NA>
## 2 Flavobacteriales Cryomorphaceae <NA> <NA>
## [1] "rel ASVmeans in Module SSU24"
## ASV rel_ASVmeans Phylum Class
## 1 ASV7848:Ulvibacter 49.77 Bacteroidota Bacteroidia
## 2 ASV12123:NS11-12 marine group 33.00 Bacteroidota Bacteroidia
## Order Family Genus Species
## 1 Flavobacteriales Flavobacteriaceae Ulvibacter <NA>
## 2 Sphingobacteriales NS11-12 marine group <NA> <NA>
## [1] "rel ASVmeans in Module SSU0"
## ASV rel_ASVmeans Phylum Class
## 1 ASV2334:hgcI clade 3.64 Actinobacteriota Actinobacteria
## 2 ASV423:Methylotenera 3.47 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Frankiales Sporichthyaceae hgcI clade <NA>
## 2 Burkholderiales Methylophilaceae Methylotenera <NA>
####
#OE#
####
cowplot::plot_grid(Module_Top5_Barplot_list$OE$SSU8, Module_Top5_Barplot_list$OE$SSU13,
Module_Top5_Barplot_list$OE$SSU6, Module_Top5_Barplot_list$OE$SSU12,
Module_Top5_Barplot_list$OE$SSU3, Module_Top5_Barplot_list$OE$SSU10,
Module_Top5_Barplot_list$OE$SSU2, nrow=1,
labels = c("OE8", "OE13", "OE6", "OE12", "OE3", "OE10", "OE2")) -> part_1
cowplot::plot_grid( Module_Top5_Barplot_list$OE$SSU11, Module_Top5_Barplot_list$OE$SSU5,
Module_Top5_Barplot_list$OE$SSU7,Module_Top5_Barplot_list$OE$SSU4,
Module_Top5_Barplot_list$OE$SSU1, Module_Top5_Barplot_list$OE$SSU9,
Module_Top5_Barplot_list$OE$SSU0, nrow=1,
labels = c( "OE11", "OE5", "OE7", "OE4", "OE1", "OE9", "OE0")) -> part_2
cowplot::plot_grid(part_1, part_2, nrow= 2) -> part_3
ggsave(part_3, filename = paste(paste("Module_topTaxa_plot",
sep="_"),".png", sep=""), path = pathPlots , device='png', dpi=300, width = 16.5,height = 6)
plot(part_3)
####
#GC#
####
cowplot::plot_grid(Module_Top5_Barplot_list$GC$SSU3, Module_Top5_Barplot_list$GC$SSU5,
Module_Top5_Barplot_list$GC$SSU9, Module_Top5_Barplot_list$GC$SSU6,
Module_Top5_Barplot_list$GC$SSU10, Module_Top5_Barplot_list$GC$SSU7,
Module_Top5_Barplot_list$GC$SSU8, Module_Top5_Barplot_list$GC$SSU16,
Module_Top5_Barplot_list$GC$SSU4,
nrow=1,
labels = c("GC3", "GC5", "GC9", "GC6", "GC10", "GC7", "GC8", "GC16", "GC4")) -> part_1
cowplot::plot_grid( Module_Top5_Barplot_list$GC$SSU14, Module_Top5_Barplot_list$GC$SSU13,
Module_Top5_Barplot_list$GC$SSU2,Module_Top5_Barplot_list$GC$SSU15,
Module_Top5_Barplot_list$GC$SSU11, Module_Top5_Barplot_list$GC$SSU1,
Module_Top5_Barplot_list$GC$SSU12, Module_Top5_Barplot_list$GC$SSU0,
nrow=1,
labels = c( "GC14", "GC13", "GC2", "GC15", "GC11", "GC1", "GC12", "GC0")) -> part_2
cowplot::plot_grid(part_1, part_2, nrow= 2) -> part_3
ggsave(part_3, filename = paste(paste("Module_topTaxa_plot_GC",
sep="_"),".png", sep=""), path = pathPlots , device='png', dpi=300, width = 18,height = 6)
plot(part_3)
####
#WF#
####
cowplot::plot_grid(Module_Top5_Barplot_list$WF$SSU9, Module_Top5_Barplot_list$WF$SSU17,
Module_Top5_Barplot_list$WF$SSU4, Module_Top5_Barplot_list$WF$SSU16,
Module_Top5_Barplot_list$WF$SSU23, Module_Top5_Barplot_list$WF$SSU15,
Module_Top5_Barplot_list$WF$SSU21, Module_Top5_Barplot_list$WF$SSU2,
Module_Top5_Barplot_list$WF$SSU10,
nrow=1,
labels = c("WF9", "WF17", "WF4", "WF16", "WF23", "WF15", "WF21", "WF2", "WF10")) -> part_1
cowplot::plot_grid(Module_Top5_Barplot_list$WF$SSU5, Module_Top5_Barplot_list$WF$SSU12,
Module_Top5_Barplot_list$WF$SSU18,Module_Top5_Barplot_list$WF$SSU3,
Module_Top5_Barplot_list$WF$SSU13, Module_Top5_Barplot_list$WF$SSU14,
Module_Top5_Barplot_list$WF$SSU19,Module_Top5_Barplot_list$WF$SSU7,
nrow=1,
labels = c("WF5", "WF12", "WF18", "WF3", "WF13", "WF14", "WF19", "WF7")) -> part_2
cowplot::plot_grid(
Module_Top5_Barplot_list$WF$SSU22, Module_Top5_Barplot_list$WF$SSU11,
Module_Top5_Barplot_list$WF$SSU8, Module_Top5_Barplot_list$WF$SSU20,
Module_Top5_Barplot_list$WF$SSU6, Module_Top5_Barplot_list$WF$SSU1,
Module_Top5_Barplot_list$WF$SSU24, Module_Top5_Barplot_list$WF$SSU0,
nrow=1,
labels = c( "WF22", "WF11", "WF8", "WF20", "WF6", "WF1", "WF24", "WF0")) -> part_3
cowplot::plot_grid(part_1, part_2, part_3, nrow= 3) -> part_4
ggsave(part_4, filename = paste(paste("Module_topTaxa_plot_WF",
sep="_"),".png", sep=""), path = pathPlots , device='png', dpi=300, width = 16,height = 9)
plot(part_4)
cowplot::plot_grid(Module_Top5_Barplot_list$WF$SSU9, Module_Top5_Barplot_list$WF$SSU17,
Module_Top5_Barplot_list$WF$SSU4, Module_Top5_Barplot_list$WF$SSU16,
Module_Top5_Barplot_list$WF$SSU23, Module_Top5_Barplot_list$WF$SSU15,
Module_Top5_Barplot_list$WF$SSU21,
nrow=1,
labels = c("WF9", "WF17", "WF4", "WF16", "WF23", "WF15", "WF21")) -> part_1
cowplot::plot_grid(Module_Top5_Barplot_list$WF$SSU2, Module_Top5_Barplot_list$WF$SSU10,
Module_Top5_Barplot_list$WF$SSU5, Module_Top5_Barplot_list$WF$SSU12,
Module_Top5_Barplot_list$WF$SSU18,Module_Top5_Barplot_list$WF$SSU3,
nrow=1,
labels = c("WF2", "WF10", "WF5", "WF12", "WF18", "WF3")) -> part_2
cowplot::plot_grid( Module_Top5_Barplot_list$WF$SSU13, Module_Top5_Barplot_list$WF$SSU14,
Module_Top5_Barplot_list$WF$SSU19,Module_Top5_Barplot_list$WF$SSU7,
Module_Top5_Barplot_list$WF$SSU22, Module_Top5_Barplot_list$WF$SSU11,
nrow=1,
labels = c("WF13", "WF14", "WF19", "WF7", "WF22", "WF11", "WF8", "WF20", "WF6", "WF1", "WF24", "WF0")) -> part_3
cowplot::plot_grid( Module_Top5_Barplot_list$WF$SSU8, Module_Top5_Barplot_list$WF$SSU20,
Module_Top5_Barplot_list$WF$SSU6, Module_Top5_Barplot_list$WF$SSU1,
Module_Top5_Barplot_list$WF$SSU24, Module_Top5_Barplot_list$WF$SSU0,
nrow=1,
labels = c( "WF8", "WF20", "WF6", "WF1", "WF24", "WF0")) -> part_4
cowplot::plot_grid(part_1, part_2, part_3, part_4, nrow= 4) -> part_5
ggsave(part_5, filename = paste(paste("Module_topTaxa_plot_WF",
sep="_"),".png", sep=""), path = pathPlots , device='png', dpi=300, width = 12,height = 11)
Lets check our understanding of these measurements: 1) kTotal - connectivity of the each gene based on its r-values to all other genes in the whole network 2) kWithin - connectivity of the each gene within a single module based on its r-values to all other genes within the same module 3) and 4) kOut and kDiff mathematical derivatives from 1) and 2) So, let say WGCNA identified 10 modules, but kWithin for “Module 2” is the largest and obviously larger than kTotal. This suggest “Module 2” to be a core of the network, or more important. Let say You perform functional annotation of the genes in this module and will find them to be enriched in amino acid analysis. And Your study was dedicated to metabolic changes under some conditions. So You can make conclusion that Your condition strongly affects AA metabolism, and via it - whole network. In contrast, high kOut can suggest that total connectivity is much larger that connectivity within modules, and I would say - reflects sort of network’s stability under tested conditions. So a set of vertices with high kOut can be targets for annotation as hubs that determined this stability. And elimination of them (knockout mutations, for example) can break the whole network.
################################################
#Scatter GS vs ModuleMembership over all Traits#
################################################
Kin_data <- list()
for (Datasetname in names(WGCNA_list)[grepl("OE|GC", names(WGCNA_list))]) {
require(WGCNA)
Kin_data[[Datasetname]] <- list()
omics_data <- WGCNA_list[[Datasetname]]$omics_data
datTraits <- WGCNA_list[[Datasetname]]$datTraits
moduleLabels <- WGCNA_list[[Datasetname]]$moduleLabels
moduleColors <- WGCNA_list[[Datasetname]]$moduleColors
MEs <- WGCNA_list[[Datasetname]]$MEs
network <- WGCNA_list[[Datasetname]]$network
ps <- WGCNA_list[[Datasetname]]$ps
softPower <- WGCNA_list[[Datasetname]]$softPower
SSUWGCNAlist <- WGCNA_list[[Datasetname]]$SSUWGCNAlist
TaxAnno <- data.frame(tax_table(WGCNA_list$WF$ps)) %>% rownames_to_column(var="ASV")
ModsOfInterst_list <- WGCNA_list[[Datasetname]]$ModsOfInterst_list[[paste("ModsOfInterest",
Datasetname, sep="_")]]
module_size<- as.data.frame(table(moduleLabels))
colnames(module_size) <- c("Module", "Size")
module_size %>% dplyr::mutate(Module_color = labels2colors(as.numeric(as.character(Module)))) ->
module_size
degrees <- intramodularConnectivity.fromExpr(as.data.frame(omics_data),
colors = moduleColors, power = softPower,
networkType = "signed", distFnc = "bicor")
degrees<- cbind(moduleLabels, degrees )
#############################################################################
#Creating GeneMembership and GeneSignificance Dataframes with GeneAnnotation#
#############################################################################
for (TraitOfInterest in colnames(datTraits)) {
if (length(rownames(ModsOfInterst_list[[TraitOfInterest]])) != 0) {
Kin_data[[Datasetname]][[TraitOfInterest]] <- list()
for (MODULE in rownames(ModsOfInterst_list[[TraitOfInterest]])) {
print(MODULE)
modNames = names(MEs) #substring(names(MEs), 2)
column = match(MODULE, modNames);
ASV <- names(moduleLabels[moduleLabels == sub("ME","", MODULE)])
moduleColor <- labels2colors(as.numeric(as.character( sub("ME","", MODULE))))
nSamples = nrow(omics_data)
Data <- as.data.frame(cbind(ASV, moduleColor))
Data <- Data %>%
left_join( #Add relative ASVmeans
data.frame(t(phyloseq::otu_table(ps %>%
transform_sample_counts(function(x) {x/sum(x)*100})))) %>%
dplyr::mutate(ASVmeans = rowMeans(.)) %>%
dplyr::mutate(ASV = rownames(.)) %>%
dplyr::select(ASV, ASVmeans)) %>%
left_join(as.data.frame(tax_table(ps)) %>% rownames_to_column(var="ASV"))
geneModuleMembership <- cor(omics_data, MEs, use = "p") %>%
as.data.frame() %>% #Create ModuleMembership
dplyr::select(paste("ME", sub("ME","", MODULE), sep="")) %>%
setNames(paste0("MM")) #No Individual naming here for Cytoscape columns beeing equal
MMPvalue <- #Creating p.values for Module memberships
as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), length(rownames(MEs)))) %>%
dplyr::select(paste("MM", sep="")) %>%
setNames(paste0("p.MM"))
geneTraitSignificance = as.data.frame(WGCNA::cor(omics_data, datTraits[TraitOfInterest], use = "p"));
GSPvalue = as.data.frame(WGCNA::corPvalueStudent(as.matrix(geneTraitSignificance), nSamples));
names(geneTraitSignificance) = "GS"
names(GSPvalue) = "p.GS"
Data <- Data %>%
left_join(geneModuleMembership %>% rownames_to_column(var="ASV")) %>%
left_join(MMPvalue %>% rownames_to_column(var="ASV")) %>%
left_join(geneTraitSignificance %>% rownames_to_column(var="ASV")) %>%
left_join(GSPvalue %>% rownames_to_column(var="ASV")) %>%
left_join(degrees%>% rownames_to_column(var = "ASV"))
# left_join(SeasonSums) %>%
# left_join(RepSums)
#dplyr::arrange(desc(ASVmeans))
rownames(Data) <- Data$ASV
Data$ASVname <- gsub("^ASV\\d+:", "", Data$ASV)
Data$Trait <- TraitOfInterest
#######################
#Filter for Parameters#
#######################
#Data_filt <- Data %>%
# filter(abs(MM) > 0.5 & p.MM < 0.05 & ASVmeans > 0.005)
Kin_data[[Datasetname]][[TraitOfInterest]][[MODULE]] <- Data
}
######
#Plot#
######
if (!is.null(Kin_data[[Datasetname]][[TraitOfInterest]])) {
BB <- do.call(rbind, Kin_data[[Datasetname]][[TraitOfInterest]])
}
BB$Modules <- sub("\\..*","",rownames(BB))
rownames(BB) <-NULL
#if (any(BB$moduleColor != "grey")) {
# BBB <- BB #[BB$moduleColor != "grey", ]
# } else {
BBB <- BB
# }
BBB$Modules <- sub(paste("_", TraitOfInterest, sep=""), "", BBB$Modules)
BBB$Modules <- sub("ME", "", BBB$Modules)
names(BBB) <- gsub(TraitOfInterest, "", names(BBB))
BBB$TraitOfInterest <- TraitOfInterest
require(cowplot)
tryCatch({
FILENAME <- paste(paste("MultiModuleScatter-kwithin", Datasetname, TraitOfInterest, sep="_"),
".png", sep="")
plot <- ggplot(data = BBB, aes(x = .data$MM, y = .data$GS)) +
geom_point(aes(colour = .data$moduleColor, size = ASVmeans), alpha = 0.5) +
scale_size_continuous(range = c(0.4, 4)) + # Adjust the size range
xlab(paste("Module Membership", sep = " ")) +
ylab(paste("ASV significance for", TraitOfInterest, sep = " ")) +
labs(title = "",#paste(Tissue, Type, "Module Membership vs. ASV Significance")
color = "sig. Modules") +
scale_color_identity(guide = 'legend',
breaks = unique(BBB$moduleColor),
labels = paste(Datasetname, unique(BBB$moduleLabels))) +
geom_hline(yintercept = 0, color = "red", linetype = "dashed")
label <- BBB %>%
dplyr::arrange(desc(ASVmeans)) %>%
dplyr::slice(c(1:5, (n() - 4):n()))
label2 <- BBB %>%
dplyr::arrange(desc(GS)) %>%
dplyr::slice(c(1:5, (n() - 4):n()))
label <- rbind(label, label2)
label <- label[!duplicated(label$ASV),]
plot <- plot +
ggrepel::geom_text_repel(
data = label, size = 4, aes(label = ASVname), color = OutlineColor)
# plot <- plot + ggrepel::geom_text_repel(
# data = BBB, size = 2.5, aes(label = ASVname), color = OutlineColor)
# plot <- plot + theme_minimal() + atheme + theme(axis.title.x = element_blank()) +
# theme(
# panel.grid.major = element_line(colour = "grey50"),
# panel.grid.minor = element_line(colour = "grey50"),
# legend.position = "right")
plot <- plot + theme_minimal() +
#atheme+
#theme(legend.position = "none") +
theme(axis.title.x = element_blank()) +
theme(
#panel.grid.major = element_line(colour = "grey50"),
#panel.grid.minor = element_line(colour = "grey50"),
legend.position = "right",
legend.title = element_text( size = 12, face = "bold"),
legend.text = element_text(size = 12,face = "bold"),
axis.text.x.bottom = element_text(size = 15, face = "bold", angle = 45, hjust = 1),
axis.text.y.left = element_text(size = 15, face = "bold"),
axis.title.x = element_text(size = 15, face = "bold"), # Axis title for x
axis.title.y = element_text(size = 15, face = "bold"))
ggsave(plot, filename = FILENAME, path = pathPlots, device='png', dpi=300, width = 6.5, height = 5)
}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}
}
WGCNA_list[[Datasetname]][["Kin_data"]] <- Kin_data
}
## softConnectivity: FYI: connecitivty of genes with less than 46 valid samples will be returned as NA.
## ..calculating connectivities..
## [1] "ME6"
## [1] "ME11"
## [1] "ME6"
## [1] "ME12"
## [1] "ME11"
## [1] "ME9"
## [1] "ME6"
## [1] "ME11"
## [1] "ME5"
## [1] "ME1"
## [1] "ME13"
## [1] "ME8"
## [1] "ME13"
## [1] "ME6"
## [1] "ME7"
## [1] "ME1"
## [1] "ME3"
## [1] "ME7"
## [1] "ME3"
## [1] "ME10"
## [1] "ME2"
## [1] "ME5"
## [1] "ME1"
## [1] "ME9"
## [1] "ME13"
## [1] "ME5"
## [1] "ME7"
## [1] "ME1"
## [1] "ME9"
## [1] "ME13"
## [1] "ME6"
## [1] "ME7"
## [1] "ME3"
## [1] "ME10"
## [1] "ME2"
## [1] "ME5"
## [1] "ME7"
## [1] "ME1"
## [1] "ME13"
## [1] "ME6"
## [1] "ME5"
## [1] "ME7"
## [1] "ME1"
## [1] "ME9"
## [1] "ME6"
## [1] "ME11"
## [1] "ME3"
## [1] "ME10"
## [1] "ME2"
## [1] "ME4"
## [1] "ME0"
## softConnectivity: FYI: connecitivty of genes with less than 30 valid samples will be returned as NA.
## ..calculating connectivities..
## [1] "ME3"
## [1] "ME7"
## [1] "ME6"
## [1] "ME16"
## [1] "ME6"
## [1] "ME7"
## [1] "ME11"
## [1] "ME7"
## [1] "ME11"
## [1] "ME7"
## [1] "ME2"
## [1] "ME15"
## [1] "ME11"
## [1] "ME5"
## [1] "ME9"
## [1] "ME7"
## [1] "ME8"
## [1] "ME2"
## [1] "ME15"
## [1] "ME11"
## [1] "ME5"
## [1] "ME9"
## [1] "ME6"
## [1] "ME6"
## [1] "ME8"
## [1] "ME16"
## [1] "ME13"
## [1] "ME2"
## [1] "ME15"
## [1] "ME11"
## [1] "ME12"
## [1] "ME6"
## [1] "ME10"
## [1] "ME13"
## [1] "ME2"
## [1] "ME5"
## [1] "ME6"
## [1] "ME10"
## [1] "ME16"
## [1] "ME5"
## [1] "ME9"
## [1] "ME6"
## [1] "ME16"
## [1] "ME4"
## [1] "ME13"
## [1] "ME2"
## [1] "ME12"
## [1] "ME8"
## [1] "ME16"
## [1] "ME2"
## [1] "ME3"
## [1] "ME7"
## [1] "ME11"
Interest <- list("NO2" =
list("OE" = "ME7", "GC" = "ME2"),
"O2" =
list("OE" = "ME7", "GC" = "ME2"),
"Salinity" =
list("OE" = "ME3", "GC" = "ME7")
)
for (Datasetname in names(WGCNA_list)[grepl("OE|GC", names(WGCNA_list))]) {
for (TraitOfInterest in names(Interest)) {
GS_SSU <- WGCNA_list[[Datasetname]]$Kin_data[[Datasetname]][[TraitOfInterest]]
MODULE <- Interest[[TraitOfInterest]][[Datasetname]]
GS_SSU <- GS_SSU[[MODULE]]
color_mapping<- pslist$Optics$colored_taxonomy_Fish
#GS_SSU <- do.call(rbind, Kin_data$OE$NO2$ME7) #%>%
#filter(abs(MM) > 0.5 & p.MM < 0.05 & ASVmeans > 0.005 & GS < -0.2)
COL <- na.omit(color_mapping[unique(names(color_mapping))])
GS_SSU$Taxa <- GS_SSU$ASVname
GS_SSU <- left_join(GS_SSU, COL,copy = TRUE, keep = NULL)
GS_SSU$Color<- GS_SSU$Color %>% replace_na("grey50")
color_mapping <- setNames(GS_SSU$Color, GS_SSU$ASVname)
color_mapping[["Candidatus Megaira"]] <- "purple"
color_mapping[["Acinetobacter.lwoffii"]] <- "#996600"
color_mapping[["Acinetobacter.johnsonii"]] <- "#663300"
color_mapping[["Shewanella"]] <- "#FF3366"
color_mapping[["Shewanella.baltica"]] <- "#FF0099"
color_mapping[["Shewanella.putrefaciens"]] <-"#FF0066"
color_mapping[["Photobacterium"]] <- "#FFF000"
color_mapping[["Aeromonas"]] <- "#FFCC00"
BacteriaHostCorrelationPlot <- ggplot(data = GS_SSU, aes(x = MM, y = GS)) +
geom_point(aes(colour = ASVname, size = ASVmeans), alpha = 0.8) +
scale_size_continuous(range = c(1, 8)) + # Adjust the size range
xlab(paste("Intramodular correlation", sep = " ")) +
ylab(paste("Correlation to",unique(GS_SSU$Trait) , sep = " ")) +
labs(#title = paste("Bacterial abundance correlated to host Gill-", subset_df$RNA_Module[1],
# sep=""),
color = "Module") +
theme(legend.position = "none") +
scale_color_manual(values = color_mapping) +
#scale_color_identity(guide = 'legend',
# breaks = unique(GS_SSU$moduleColor),
# labels = paste(Datasetname, unique(GS_SSU$moduleLabels), sep="")) +
geom_hline(yintercept = 0, color = "red", size = 1,linetype = "dashed")
#BacteriaHostCorrelatonPlot <- BacteriaHostCorrelatonPlot +
# ggrepel::geom_text_repel(
# data = GS_SSU, size = 4, aes(label = ASVname), color = OutlineColor) #white
label <- GS_SSU %>%
dplyr::arrange(desc(ASVmeans)) %>%
head(10)
BacteriaHostCorrelationPlot <- BacteriaHostCorrelationPlot +
ggrepel::geom_text_repel(
data = label, size = 4, aes(label = ASVname), color = OutlineColor)
BacteriaHostCorrelationPlot <- BacteriaHostCorrelationPlot + theme_minimal() +
#atheme+
theme(legend.position = "none") +
theme(axis.title.x = element_blank()) +
theme(
#panel.grid.major = element_line(colour = "grey50"),
#panel.grid.minor = element_line(colour = "grey50"),
#legend.position = "right",
legend.title = element_text( size = 12, face = "bold"),
legend.text = element_text(size = 12,face = "bold"),
axis.text.x.bottom = element_text(size = 15, face = "bold", angle = 45, hjust = 1),
axis.text.y.left = element_text(size = 15, face = "bold"),
axis.title.x = element_text(size = 15, face = "bold"), # Axis title for x
axis.title.y = element_text(size = 15, face = "bold"))
ggsave(BacteriaHostCorrelationPlot, filename = paste(paste(save_name, "GS_SSU", Datasetname,
unique(GS_SSU$moduleLabels), unique(GS_SSU$Trait), sep="_"), ".png", sep="") ,
path = pathPlots, device='png', dpi=300, width = 4, height = 4)
plot(BacteriaHostCorrelationPlot)
}
}
## Joining with `by = join_by(Taxa)`
## Joining with `by = join_by(Taxa)`
## Joining with `by = join_by(Taxa)`
## Joining with `by = join_by(Taxa)`
## Joining with `by = join_by(Taxa)`
## Joining with `by = join_by(Taxa)`
#Overall abundance per Module
ASVsums <- as.data.frame(sapply(WGCNA_list$OE$SSUWGCNAlist, function(x) sum(x$ASVmeans)))
ASVsums<- ASVsums %>%
setNames("ASVsum") %>%
rownames_to_column(var = "Module") %>%
dplyr::arrange(desc(ASVsum))
ASVsums
## Module ASVsum
## 1 SSU6 38.16113612
## 2 SSU0 28.13226744
## 3 SSU5 6.52820305
## 4 SSU1 6.45842648
## 5 SSU9 5.11210527
## 6 SSU7 4.05596022
## 7 SSU4 3.73560975
## 8 SSU3 2.59369432
## 9 SSU2 1.99255136
## 10 SSU8 1.00034891
## 11 SSU10 0.86454418
## 12 SSU12 0.77910329
## 13 SSU11 0.49865859
## 14 SSU13 0.08739103
ASVsums <- as.data.frame(sapply(WGCNA_list$GC$SSUWGCNAlist, function(x) sum(x$ASVmeans)))
ASVsums<- ASVsums %>%
setNames("ASVsum") %>%
rownames_to_column(var = "Module") %>%
dplyr::arrange(desc(ASVsum))
ASVsums
## Module ASVsum
## 1 SSU8 44.76220787
## 2 SSU0 19.00825798
## 3 SSU5 8.04758341
## 4 SSU11 6.79125083
## 5 SSU2 4.37238718
## 6 SSU4 4.10244216
## 7 SSU1 3.22182500
## 8 SSU3 3.06561530
## 9 SSU10 1.58117727
## 10 SSU14 1.21377615
## 11 SSU6 0.86314919
## 12 SSU7 0.84463946
## 13 SSU12 0.77722663
## 14 SSU9 0.71343565
## 15 SSU16 0.42434350
## 16 SSU13 0.13114311
## 17 SSU15 0.07953932
ASVsums <- as.data.frame(sapply(WGCNA_list$WF$SSUWGCNAlist, function(x) sum(x$ASVmeans)))
ASVsums<- ASVsums %>%
setNames("ASVsum") %>%
rownames_to_column(var = "Module") %>%
dplyr::arrange(desc(ASVsum))
ASVsums
## Module ASVsum
## 1 SSU4 18.67512697
## 2 SSU1 17.95200467
## 3 SSU3 12.69217555
## 4 SSU2 8.42571784
## 5 SSU6 8.08816629
## 6 SSU12 5.33509446
## 7 SSU0 5.15722962
## 8 SSU10 4.66452607
## 9 SSU7 3.98598817
## 10 SSU5 3.22604492
## 11 SSU9 2.83619097
## 12 SSU16 2.50752627
## 13 SSU8 2.05215664
## 14 SSU11 1.21562483
## 15 SSU22 0.77579042
## 16 SSU18 0.50431036
## 17 SSU13 0.35413346
## 18 SSU15 0.27380863
## 19 SSU14 0.26776191
## 20 SSU20 0.21968099
## 21 SSU17 0.21803867
## 22 SSU19 0.21790857
## 23 SSU23 0.19137072
## 24 SSU21 0.12892261
## 25 SSU24 0.03470041
####################################################
#Prevalence core vs network core & Bacterioplankton#
####################################################
for (Datasetname in names(WGCNA_list)) {
#Datasetname <- sub("ps_","", Dataset)
ps <- pslist[[paste("ps", Datasetname, sep="_")]]
if (Datasetname == "OE") {
Core_ME <- "6"
} else if (Datasetname == "GC") {
Core_ME <- "8"
}
SampleSums <- ps %>%
transform_sample_counts(function(x) {x/sum(x)*100}) %>%
phyloseq::otu_table() %>%
as.data.frame() %>%
rownames_to_column(var = "SampleID") %>%
dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
dplyr::group_by(SampleID) %>%
dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>%
t() %>%
as.data.frame() %>%
`colnames<-`(.[1, ]) %>%
.[-1, ] %>%
stats::setNames(paste0("avg_", colnames(.))) %>%
mutate_all(as.numeric) %>%
round(., digits = 2) %>%
rownames_to_column(var="ASV")
SeasonSums <- ps %>%
transform_sample_counts(function(x) {x/sum(x)*100}) %>%
phyloseq::otu_table() %>%
as.data.frame() %>%
rownames_to_column(var = "SampleID") %>%
dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
dplyr::group_by(Season) %>%
dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>%
t() %>%
as.data.frame() %>%
`colnames<-`(.[1, ]) %>%
.[-1, ] %>%
stats::setNames(paste0("avg_", colnames(.))) %>%
mutate_all(as.numeric) %>%
round(., digits = 2) %>%
rownames_to_column(var="ASV")
RepsSums <- ps %>%
transform_sample_counts(function(x) {x/sum(x)*100}) %>%
phyloseq::otu_table() %>%
as.data.frame() %>%
rownames_to_column(var = "SampleID") %>%
dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
dplyr::group_by(Replicates) %>%
dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>%
t() %>%
as.data.frame() %>%
`colnames<-`(.[1, ]) %>%
.[-1, ] %>%
stats::setNames(paste0("avg_", colnames(.))) %>%
mutate_all(as.numeric) %>%
round(., digits = 2) %>%
rownames_to_column(var="ASV")
################################################
#How much does the Core microbiome account for:
Core <- pslist[[paste("Core", Datasetname, sep="_")]] %>%
left_join( #Add relative ASVmeans
data.frame(t(phyloseq::otu_table(ps %>%
transform_sample_counts(function(x) {x/sum(x)*100})))) %>%
dplyr::mutate(ASVmeans = rowMeans(.)) %>%
dplyr::mutate(ASV = rownames(.)) %>%
dplyr::select(ASV, ASVmeans))
Core_ME_dat <- WGCNA_list[[Datasetname]]$SSUWGCNAlist[[paste("SSU", Core_ME,
sep="")]]
print("Seasonal")
print(SeasonSums %>%
dplyr::filter(ASV %in% Core$ASV) %>%
dplyr::select(-ASV) %>%
colSums() %>%
as.data.frame()
)
print("Replicates")
print(
RepsSums %>%
dplyr::filter(ASV %in% Core$ASV) %>%
dplyr::select(-ASV) %>%
colSums() %>%
as.data.frame()
)
#############################
#Core Part of the Microbiome#
#############################
print(Datasetname)
print(paste("Core Taxa account for", round(sum(Core$ASVmeans), digits=2), "%", "of the",
Datasetname, "Microbiome"))
print(paste("Core Taxa", length(Core$ASVmeans)))
print("Sample summary")
print(SampleSums %>%
dplyr::filter(ASV %in% Core$ASV) %>%
dplyr::select(-ASV) %>%
colSums() %>%
summary())
print(paste("sd", sd(SampleSums %>%
dplyr::filter(ASV %in% Core$ASV) %>%
dplyr::select(-ASV) %>%
colSums())))
print("rel ASVmeans per Phylum in Core")
print(
Core %>%
dplyr::mutate(total_sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
dplyr::group_by(Phylum) %>%
dplyr::mutate(sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
dplyr::summarize(rel_ASVmeans = round(sum(sum_ASVmeans) / sum(total_sum_ASVmeans) * 100,
digits=2))%>%
dplyr::arrange(desc(rel_ASVmeans)) %>%
as.data.frame()
)
print("rel ASVmeans per Order in Core")
print(
Core %>%
dplyr::mutate(total_sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
dplyr::group_by(Order) %>%
dplyr::mutate(sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
dplyr::summarize(rel_ASVmeans = round(sum(sum_ASVmeans) / sum(total_sum_ASVmeans) * 100,
digits=2))%>%
dplyr::arrange(desc(rel_ASVmeans)) %>%
as.data.frame()
)
print("rel ASVmeans per Taxon in Core")
print(
Core %>%
dplyr::mutate(Taxon = gsub("ASV\\d+:", "", ASV)) %>%
dplyr::group_by(Taxon) %>%
dplyr::summarize(Taxon_means = sum(ASVmeans)) %>%
dplyr::arrange(desc(Taxon_means)) %>%
as.data.frame()
)
######################
#Module genes in Core#
######################
print(paste("ME Taxa", length(Core_ME_dat[[paste("ME", Core_ME,
sep="")]])))
length((Core$ASV %in% Core_ME_dat$ASV))
print(paste(round(length(Core[Core$ASV %in% Core_ME_dat$ASV,]$ASV)/ length(Core$ASV) *100, digits=2),"% Core ASVs in Module", Datasetname, Core_ME))
print(paste("Relative Abundance of", Datasetname, "Core Taxa in", Datasetname, Core_ME))
print(sum(Core[Core$ASV %in% Core_ME_dat$ASV,]$ASVmeans)/sum(Core$ASVmeans)*100)
print(paste("Relative Abundance of", Datasetname, Core_ME, "in Core Taxa in", Datasetname))
print(sum(Core_ME_dat[Core_ME_dat$ASV %in% Core$ASV,]$ASVmeans)/sum(Core_ME_dat$ASVmeans))
print("-")
#####################
#Core in Waterfilter#
#####################
WF_ASVmeans <- pslist$ps_WF %>%
transform_sample_counts(function(x) {x/sum(x)*100}) %>%
phyloseq::otu_table() %>%
t() %>%
as.data.frame() %>%
dplyr::mutate(ASVmeans = rowMeans(.)) %>%
dplyr::mutate(ASV = rownames(.)) %>%
dplyr::select(ASV, ASVmeans)
print(paste("Amount of Taxa shared in Core and Bacterioplankton"))
print(sum(Core$ASV %in% rownames(t(otu_table(pslist$ps_WF)))/length(Core$ASV))*100)
print(paste("Relative Abundance of", Datasetname, "Core Taxa in", "in Bacterioplankton"))
print(sum(WF_ASVmeans[WF_ASVmeans$ASV %in% Core$ASV,]$ASVmeans))
print(paste("Relative Abundance of", Datasetname, Core_ME, "in Bacterioplankton"))
print(sum(WF_ASVmeans[WF_ASVmeans$ASV %in% Core_ME_dat$ASV,]$ASVmeans))
#In section Bacterioplankton vs Fish
# #################################
# #Overlapping Taxa betweem biomes#
# #################################
# Overlapping <- as.data.frame(Overlapping_Taxa) %>%
# `colnames<-`("ASV") %>%
# left_join( #Add relative ASVmeans
# data.frame(t(phyloseq::otu_table(ps %>%
# transform_sample_counts(function(x) {x/sum(x)*100})))) %>%
# dplyr::mutate(ASVmeans = rowMeans(.)) %>%
# dplyr::mutate(ASV = rownames(.)) %>%
# dplyr::select(ASV, ASVmeans))
# print(paste("Relative Abundance of Overlapping taxa between biomes in", Datasetname))
# print(sum(Overlapping$ASVmeans))
#
# print(paste("Relative Abundance of Overlapping taxa between biomes in Bacterioplankton"))
# print(sum(WF_ASVmeans[WF_ASVmeans$ASV %in%Overlapping_Taxa,]$ASVmeans))
#
#
#
# RepsSums[RepsSums$ASV %in% Overlapping$ASV,] %>%
# dplyr::summarize(across(starts_with("avg_"), sum, na.rm = TRUE))%>%
# as.data.frame()
}
## Joining with `by = join_by(ASV)`
## [1] "Seasonal"
## .
## avg_Autumn_21 14.30
## avg_Spring_22 37.67
## avg_Summer_21 34.53
## avg_Winter_22 33.58
## [1] "Replicates"
## .
## avg_OEAU21BB 13.33
## avg_OEAU21MG 14.66
## avg_OEAU21ML 11.81
## avg_OEAU21SS 17.14
## avg_OEAU21TW 14.51
## avg_OESP22BB 43.35
## avg_OESP22MG 32.47
## avg_OESP22ML 38.52
## avg_OESP22SS 40.59
## avg_OESP22TW 33.41
## avg_OESU21BB 41.79
## avg_OESU21MG 38.99
## avg_OESU21ML 18.38
## avg_OESU21SS 37.98
## avg_OESU21TW 30.84
## avg_OEWI22BB 34.03
## avg_OEWI22MG 35.68
## avg_OEWI22ML 30.64
## avg_OEWI22SS 36.71
## avg_OEWI22TW 30.76
## [1] "OE"
## [1] "Core Taxa account for 29.95 % of the OE Microbiome"
## [1] "Core Taxa 27"
## [1] "Sample summary"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.40 19.23 32.55 29.95 40.83 52.36
## [1] "sd 12.130501795143"
## [1] "rel ASVmeans per Phylum in Core"
## Phylum rel_ASVmeans
## 1 Bacteroidota 60.57
## 2 Proteobacteria 27.03
## 3 Actinobacteriota 9.16
## 4 Deinococcota 3.23
## [1] "rel ASVmeans per Order in Core"
## Order rel_ASVmeans
## 1 Flavobacteriales 56.39
## 2 Enterobacterales 17.88
## 3 Micrococcales 7.61
## 4 Chitinophagales 4.18
## 5 Rhizobiales 3.76
## 6 Deinococcales 3.23
## 7 Pseudomonadales 2.96
## 8 Burkholderiales 1.31
## 9 Microtrichales 0.80
## 10 Sphingomonadales 0.60
## 11 Caulobacterales 0.52
## 12 Corynebacteriales 0.41
## 13 Propionibacteriales 0.34
## [1] "rel ASVmeans per Taxon in Core"
## Taxon Taxon_means
## 1 Elizabethkingia 15.6228863
## 2 Enterobacteriaceae 2.5116446
## 3 Enterobacter 1.9832069
## 4 Arthrobacter 1.7128911
## 5 Asinibacterium 1.2507845
## 6 Deinococcus 0.9689178
## 7 Lelliottia 0.8590406
## 8 Psychrobacter 0.6132957
## 9 Mesorhizobium 0.5088703
## 10 Weeksellaceae 0.4624175
## 11 Chryseobacterium.antarcticum 0.4033593
## 12 Chryseobacterium 0.4012136
## 13 Knoellia 0.3612595
## 14 Bradyrhizobium 0.3294495
## 15 Pseudomonas 0.2729780
## 16 Ilumatobacter 0.2392540
## 17 Agrococcus 0.2064247
## 18 Sphingomonas 0.1811390
## 19 Afipia 0.1592598
## 20 Caulobacteraceae 0.1568132
## 21 Comamonas.koreensis 0.1519231
## 22 Xanthobacteraceae 0.1288640
## 23 TRA3-20 0.1282404
## 24 Mycobacterium 0.1241212
## 25 GOUTA6 0.1122962
## 26 Cutibacterium 0.1007334
## [1] "ME Taxa 72"
## [1] "18.52 % Core ASVs in Module OE 6"
## [1] "Relative Abundance of OE Core Taxa in OE 6"
## [1] 70.54356
## [1] "Relative Abundance of OE 6 in Core Taxa in OE"
## [1] 0.5536707
## [1] "-"
## [1] "Amount of Taxa shared in Core and Bacterioplankton"
## [1] 33.33333
## [1] "Relative Abundance of OE Core Taxa in in Bacterioplankton"
## [1] 1.028735
## [1] "Relative Abundance of OE 6 in Bacterioplankton"
## [1] 0.1076743
## Joining with `by = join_by(ASV)`
## [1] "Seasonal"
## .
## avg_Autumn_21 58.77
## avg_Spring_22 60.44
## avg_Summer_21 36.21
## avg_Winter_22 44.01
## [1] "Replicates"
## .
## avg_GCAU21BB 50.56
## avg_GCAU21ML 61.51
## avg_GCAU21SS 67.55
## avg_GCAU21TW 51.08
## avg_GCSP22BB 65.14
## avg_GCSP22ML 59.31
## avg_GCSP22SS 56.72
## avg_GCSP22TW 60.78
## avg_GCSU21BB 58.98
## avg_GCSU21ML 22.60
## avg_GCSU21SS 41.44
## avg_GCSU21TW 32.41
## avg_GCWI22BB 34.89
## avg_GCWI22ML 44.45
## avg_GCWI22SS 49.25
## [1] "GC"
## [1] "Core Taxa account for 50.24 % of the GC Microbiome"
## [1] "Core Taxa 64"
## [1] "Sample summary"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.39 33.30 52.17 50.23 67.49 89.12
## [1] "sd 21.824715724556"
## [1] "rel ASVmeans per Phylum in Core"
## Phylum rel_ASVmeans
## 1 Proteobacteria 56.08
## 2 Bacteroidota 36.92
## 3 Actinobacteriota 5.16
## 4 Deinococcota 1.83
## [1] "rel ASVmeans per Order in Core"
## Order rel_ASVmeans
## 1 Enterobacterales 49.07
## 2 Flavobacteriales 36.00
## 3 Micrococcales 4.10
## 4 Burkholderiales 4.07
## 5 Pseudomonadales 2.11
## 6 Deinococcales 1.83
## 7 Rhizobiales 0.83
## 8 Chitinophagales 0.80
## 9 Corynebacteriales 0.55
## 10 Microtrichales 0.51
## 11 Sphingobacteriales 0.12
## [1] "rel ASVmeans per Taxon in Core"
## Taxon Taxon_means
## 1 Elizabethkingia 17.11836905
## 2 Enterobacteriaceae 12.11279350
## 3 Citrobacter 5.34848554
## 4 Enterobacter 4.11589997
## 5 Enterobacter.cancerogenus 1.30496659
## 6 Microvirgula 1.20803246
## 7 Lelliottia 1.12780959
## 8 Deinococcus 0.91853579
## 9 Arthrobacter 0.86307313
## 10 Pseudomonas 0.61334902
## 11 Weeksellaceae 0.40283825
## 12 Asinibacterium 0.40029341
## 13 Ornithinicoccus 0.39801685
## 14 Psychrobacter 0.37895266
## 15 Knoellia 0.37296312
## 16 Chryseobacterium.antarcticum 0.35067289
## 17 Agrococcus 0.26251153
## 18 Ilumatobacter 0.25751606
## 19 Yersinia 0.23108201
## 20 Chryseobacterium 0.21519016
## 21 Delftia 0.19187890
## 22 Comamonas.koreensis 0.18401844
## 23 Aeromonas 0.16625115
## 24 Intrasporangiaceae 0.16356181
## 25 Mycobacterium 0.15009177
## 26 Klebsiella 0.14766568
## 27 Achromobacter 0.14752195
## 28 Bradyrhizobium 0.12931774
## 29 Rhodococcus 0.12683854
## 30 GOUTA6 0.12291035
## 31 Xanthobacteraceae 0.11671916
## 32 Hyphomicrobium 0.11276910
## 33 TRA3-20 0.10544011
## 34 Serratia 0.09696374
## 35 Comamonas 0.08273035
## 36 Pseudomonas.batumici 0.06895924
## 37 Sphingobacterium 0.06154681
## 38 Pseudochrobactrum 0.05883710
## [1] "ME Taxa 61"
## [1] "68.75 % Core ASVs in Module GC 8"
## [1] "Relative Abundance of GC Core Taxa in GC 8"
## [1] 88.17333
## [1] "Relative Abundance of GC 8 in Core Taxa in GC"
## [1] 0.9895447
## [1] "-"
## [1] "Amount of Taxa shared in Core and Bacterioplankton"
## [1] 20.3125
## [1] "Relative Abundance of GC Core Taxa in in Bacterioplankton"
## [1] 1.198428
## [1] "Relative Abundance of GC 8 in Bacterioplankton"
## [1] 0.1146737
## Joining with `by = join_by(ASV)`
## [1] "Seasonal"
## .
## avg_Autumn_21 43.51
## avg_Spring_22 41.21
## avg_Summer_21 34.48
## avg_Winter_22 33.99
## [1] "Replicates"
## .
## avg_WFAU21TW 43.51
## avg_WFSP22BB 36.19
## avg_WFSP22MG 21.96
## avg_WFSP22ML 53.09
## avg_WFSP22SS 53.55
## avg_WFSU21BB 17.52
## avg_WFSU21MG 27.81
## avg_WFSU21ML 44.44
## avg_WFSU21SS 48.38
## avg_WFWI22BB 36.26
## avg_WFWI22MG 36.75
## avg_WFWI22ML 27.18
## avg_WFWI22SS 30.62
## [1] "WF"
## [1] "Core Taxa account for 37.64 % of the WF Microbiome"
## [1] "Core Taxa 170"
## [1] "Sample summary"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 17.38 29.21 36.61 37.63 46.41 54.79
## [1] "sd 11.5060520561194"
## [1] "rel ASVmeans per Phylum in Core"
## Phylum rel_ASVmeans
## 1 Actinobacteriota 39.24
## 2 Proteobacteria 35.31
## 3 Bacteroidota 7.40
## 4 Verrucomicrobiota 5.70
## 5 Nitrospirota 4.64
## 6 Desulfobacterota 2.30
## 7 Acidobacteriota 1.40
## 8 Gemmatimonadota 1.18
## 9 Cyanobacteria 0.91
## 10 Chloroflexi 0.60
## 11 Planctomycetota 0.54
## 12 Myxococcota 0.45
## 13 Firmicutes 0.32
## [1] "rel ASVmeans per Order in Core"
## Order rel_ASVmeans
## 1 Frankiales 24.40
## 2 Burkholderiales 20.21
## 3 Microtrichales 11.14
## 4 Verrucomicrobiales 5.70
## 5 Nitrospirales 4.64
## 6 Rhizobiales 4.55
## 7 Pseudomonadales 4.12
## 8 Chitinophagales 3.09
## 9 Cytophagales 3.04
## 10 Micrococcales 2.24
## 11 Steroidobacterales 1.73
## 12 Corynebacteriales 1.34
## 13 Flavobacteriales 1.27
## 14 Desulfobulbales 1.25
## 15 Gemmatimonadales 1.18
## 16 Rhodobacterales 1.16
## 17 Blastocatellales 1.13
## 18 Cyanobacteriales 0.91
## 19 Methylococcales 0.85
## 20 Sphingomonadales 0.80
## 21 Desulfuromonadales 0.75
## 22 Caulobacterales 0.62
## 23 Anaerolineales 0.60
## 24 Gemmatales 0.54
## 25 Polyangiales 0.45
## 26 Gammaproteobacteria Incertae Sedis 0.40
## 27 Acidiferrobacterales 0.36
## 28 Reyranellales 0.36
## 29 Peptostreptococcales-Tissierellales 0.32
## 30 Vicinamibacterales 0.27
## 31 Desulfatiglandales 0.18
## 32 Nitrosococcales 0.14
## 33 Desulfobacterales 0.13
## 34 Gaiellales 0.13
## [1] "rel ASVmeans per Taxon in Core"
## Taxon Taxon_means
## 1 hgcI clade 4.83727795
## 2 CL500-29 marine group 3.49193649
## 3 Sporichthyaceae 3.35574208
## 4 Limnohabitans 1.83683223
## 5 Persicirhabdus 1.81414932
## 6 Nitrospira 1.61253257
## 7 Candidatus Planktophila 0.99106296
## 8 TRA3-20 0.93151256
## 9 Rhizobiales Incertae Sedis 0.86384163
## 10 Polynucleobacter.difficilis 0.85347953
## 11 Candidatus Methylopumilus 0.77997753
## 12 Candidatus Limnoluna 0.76513131
## 13 Saprospiraceae 0.71370411
## 14 Ilumatobacter 0.59288656
## 15 Halioglobus 0.59154649
## 16 Methylotenera 0.54775641
## 17 Pseudohongiella 0.52498050
## 18 Mycobacterium 0.50457620
## 19 Woeseia 0.48299609
## 20 Pseudarcicella 0.47663873
## 21 Polynucleobacter 0.47629849
## 22 A0839 0.40863356
## 23 Hoppeia 0.38824829
## 24 Planktothrix NIVA-CYA 15 0.34424637
## 25 Gemmatimonadaceae 0.33580787
## 26 Candidatus Aquirestis 0.32856382
## 27 Limnohabitans.curvus 0.32365086
## 28 Methylobacter 0.32141650
## 29 Sphingorhabdus.wooponensis 0.30297773
## 30 Limnohabitans.planktonicus 0.29903573
## 31 Desulfobulbaceae 0.29254146
## 32 JGI 0001001-H03 0.25643981
## 33 Algoriphagus 0.25481414
## 34 Sva1033 0.23701807
## 35 Comamonadaceae 0.23133487
## 36 Anaerolineaceae 0.22564370
## 37 Boseongicola 0.21997164
## 38 Rhodobacteraceae 0.21780302
## 39 DEV007 0.20403019
## 40 Gemmataceae 0.20230230
## 41 OLB12 0.19960325
## 42 SC-I-84 0.18828257
## 43 Polynucleobacter.acidiphobus 0.18410785
## 44 Hirschia 0.17910742
## 45 Desulfocapsaceae 0.17710577
## 46 OM60(NOR5) clade 0.17542745
## 47 Sandaracinaceae 0.16849765
## 48 Steroidobacteraceae 0.16643033
## 49 Sutterellaceae 0.16057292
## 50 KI89A clade 0.15845129
## 51 oc32 0.15141838
## 52 Unknown Family 0.15141725
## 53 Sulfurifustis 0.13563609
## 54 Nitrospira.lenta 0.13314945
## 55 Xanthobacteraceae 0.12972283
## 56 Luteolibacter 0.12828495
## 57 Microscillaceae 0.12429488
## 58 Romboutsia 0.12233856
## 59 Ferruginibacter 0.12001499
## 60 Gemmatimonas 0.10977796
## 61 Sva0996 marine group 0.10960205
## 62 Vicinamibacteraceae 0.09979998
## 63 Blastocatella 0.09861016
## 64 Lautropia 0.09326635
## 65 Chryseolinea 0.09002443
## 66 GOUTA6 0.08954718
## 67 Eudoraea 0.08811628
## 68 GKS98 freshwater group 0.08571117
## 69 MND1 0.08568549
## 70 Hyphomicrobium 0.08292067
## 71 Candidatus Nitrotoga 0.07812251
## 72 Sulfuritalea 0.07784905
## 73 Rhodoluna.lacicola 0.07676456
## 74 KF-JG30-B3 0.07265650
## 75 Stenotrophobacter 0.07175602
## 76 Hyphomicrobiaceae 0.06950940
## 77 Reyranella 0.06919873
## 78 Desulfatiglans 0.06747465
## 79 Reyranella.aquatilis 0.06711296
## 80 Nitrosospira 0.06342258
## 81 Pedomicrobium 0.05506956
## 82 SWB02 0.05469795
## 83 SZB85 0.05417345
## 84 BD1-7 clade 0.05290782
## 85 Gaiella 0.04710704
## 86 Gven-F17 0.04627225
## 87 Geothermobacter 0.04587001
## 88 Ellin6067 0.04006136
## 89 Chthonobacter 0.03142548
## 90 Sva0081 sediment group 0.02972735
## 91 Nitrosomonadaceae 0.02830726
## 92 Desulfosarcina 0.01776697
## [1] "ME Taxa 66"
## [1] "1.18 % Core ASVs in Module WF 8"
## [1] "Relative Abundance of WF Core Taxa in WF 8"
## [1] 0.9622309
## [1] "Relative Abundance of WF 8 in Core Taxa in WF"
## [1] 0.1765058
## [1] "-"
## [1] "Amount of Taxa shared in Core and Bacterioplankton"
## [1] 100
## [1] "Relative Abundance of WF Core Taxa in in Bacterioplankton"
## [1] 37.64352
## [1] "Relative Abundance of WF 8 in Bacterioplankton"
## [1] 2.052157
##############################
#Sums for modules of interest#
##############################
for (Datasetname in names(WGCNA_list)[grepl("OE|GC", names(WGCNA_list))]) {
#Datasetname <- sub("ps_","", Dataset)
ps <- pslist[[paste("ps", Datasetname, sep="_")]]
if (Datasetname == "OE") {
Interest_ME <- "0"
} else if (Datasetname == "GC") {
Interest_ME <- "7"
}
SampleSums <- ps %>%
transform_sample_counts(function(x) {x/sum(x)*100}) %>%
phyloseq::otu_table() %>%
as.data.frame() %>%
rownames_to_column(var = "SampleID") %>%
dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
dplyr::group_by(SampleID) %>%
dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>%
t() %>%
as.data.frame() %>%
`colnames<-`(.[1, ]) %>%
.[-1, ] %>%
stats::setNames(paste0("avg_", colnames(.))) %>%
mutate_all(as.numeric) %>%
round(., digits = 2) %>%
rownames_to_column(var="ASV")
SeasonSums <- ps %>%
transform_sample_counts(function(x) {x/sum(x)*100}) %>%
phyloseq::otu_table() %>%
as.data.frame() %>%
rownames_to_column(var = "SampleID") %>%
dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
dplyr::group_by(Season) %>%
dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>%
t() %>%
as.data.frame() %>%
`colnames<-`(.[1, ]) %>%
.[-1, ] %>%
stats::setNames(paste0("avg_", colnames(.))) %>%
mutate_all(as.numeric) %>%
round(., digits = 2) %>%
rownames_to_column(var="ASV")
RepsSums <- ps %>%
transform_sample_counts(function(x) {x/sum(x)*100}) %>%
phyloseq::otu_table() %>%
as.data.frame() %>%
rownames_to_column(var = "SampleID") %>%
dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
dplyr::group_by(Replicates) %>%
dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>%
t() %>%
as.data.frame() %>%
`colnames<-`(.[1, ]) %>%
.[-1, ] %>%
stats::setNames(paste0("avg_", colnames(.))) %>%
mutate_all(as.numeric) %>%
round(., digits = 2) %>%
rownames_to_column(var="ASV")
################################################
Interest_ME_dat <- WGCNA_list[[Datasetname]]$SSUWGCNAlist[[paste("SSU", Interest_ME,
sep="")]]
print("Seasonal")
print(SeasonSums %>%
dplyr::filter(ASV %in% Interest_ME_dat$ASV) %>%
dplyr::select(-ASV) %>%
colSums() %>%
as.data.frame()
)
print("Replicates")
print(
RepsSums %>%
dplyr::filter(ASV %in% Interest_ME_dat$ASV) %>%
dplyr::select(-ASV) %>%
colSums() %>%
as.data.frame()
)
print("rel ASVmeans per Order in Core")
print(
head(Interest_ME_dat %>%
dplyr::mutate(total_sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
dplyr::group_by(ASV) %>%
dplyr::mutate(sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
dplyr::summarize(rel_ASVmeans = round(sum(sum_ASVmeans) / sum(total_sum_ASVmeans) * 100,
digits=2))%>%
dplyr::arrange(desc(rel_ASVmeans)) %>%
as.data.frame() %>%
left_join(Interest_ME_dat[c("Phylum", "Class", "Order", "Family", "Genus", "Species", "ASV")])
))
print("rel ASVmeans per Genus in Core")
print(
head(Interest_ME_dat %>%
dplyr::mutate(total_sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
dplyr::group_by(Genus) %>%
dplyr::mutate(sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
dplyr::summarize(rel_ASVmeans = round(sum(sum_ASVmeans) / sum(total_sum_ASVmeans) * 100,
digits=2))%>%
dplyr::arrange(desc(rel_ASVmeans)) %>%
as.data.frame()
))
#####################
#Interest in Waterfilter#
#####################
WF_ASVmeans <- pslist$ps_WF %>%
transform_sample_counts(function(x) {x/sum(x)*100}) %>%
phyloseq::otu_table() %>%
t() %>%
as.data.frame() %>%
dplyr::mutate(ASVmeans = rowMeans(.)) %>%
dplyr::mutate(ASV = rownames(.)) %>%
dplyr::select(ASV, ASVmeans)
print(paste("Amount of Taxa shared", Datasetname, Interest_ME, "and Bacterioplankton"))
print(sum(Interest_ME_dat$ASV %in% rownames(t(otu_table(pslist$ps_WF)))/length(Interest_ME_dat$ASV ))*100)
print(paste("Relative Abundance of", Datasetname, Interest_ME, "in Bacterioplankton"))
print(sum(WF_ASVmeans[WF_ASVmeans$ASV %in% Interest_ME_dat$ASV,]$ASVmeans))
}
## [1] "Seasonal"
## .
## avg_Autumn_21 49.31
## avg_Spring_22 19.98
## avg_Summer_21 20.75
## avg_Winter_22 21.16
## [1] "Replicates"
## .
## avg_OEAU21BB 51.81
## avg_OEAU21MG 30.52
## avg_OEAU21ML 55.70
## avg_OEAU21SS 55.72
## avg_OEAU21TW 52.76
## avg_OESP22BB 15.44
## avg_OESP22MG 23.03
## avg_OESP22ML 22.75
## avg_OESP22SS 17.97
## avg_OESP22TW 20.55
## avg_OESU21BB 17.00
## avg_OESU21MG 16.85
## avg_OESU21ML 29.97
## avg_OESU21SS 23.04
## avg_OESU21TW 19.19
## avg_OEWI22BB 22.16
## avg_OEWI22MG 24.00
## avg_OEWI22ML 22.43
## avg_OEWI22SS 19.43
## avg_OEWI22TW 18.48
## [1] "rel ASVmeans per Order in Core"
## Joining with `by = join_by(ASV)`
## ASV rel_ASVmeans Phylum
## 1 ASV11:Luteolibacter 8.39 Verrucomicrobiota
## 2 ASV16:Asinibacterium 4.45 Bacteroidota
## 3 ASV9:Clostridium sensu stricto 1 3.53 Firmicutes
## 4 ASV19:Endozoicomonas 3.48 Proteobacteria
## 5 ASV30:Deinococcus 2.26 Deinococcota
## 6 ASV33:Mesorhizobium 1.81 Proteobacteria
## Class Order Family
## 1 Verrucomicrobiae Verrucomicrobiales Rubritaleaceae
## 2 Bacteroidia Chitinophagales Chitinophagaceae
## 3 Clostridia Clostridiales Clostridiaceae
## 4 Gammaproteobacteria Pseudomonadales Endozoicomonadaceae
## 5 Deinococci Deinococcales Deinococcaceae
## 6 Alphaproteobacteria Rhizobiales Rhizobiaceae
## Genus Species
## 1 Luteolibacter <NA>
## 2 Asinibacterium <NA>
## 3 Clostridium sensu stricto 1 <NA>
## 4 Endozoicomonas <NA>
## 5 Deinococcus <NA>
## 6 Mesorhizobium <NA>
## [1] "rel ASVmeans per Genus in Core"
## Genus rel_ASVmeans
## 1 <NA> 14.43
## 2 Luteolibacter 13.25
## 3 Psychrobacter 7.32
## 4 Asinibacterium 4.53
## 5 Clostridium sensu stricto 1 4.47
## 6 Deinococcus 4.03
## [1] "Amount of Taxa shared OE 0 and Bacterioplankton"
## [1] 51.43639
## [1] "Relative Abundance of OE 0 in Bacterioplankton"
## [1] 33.79109
## [1] "Seasonal"
## .
## avg_Autumn_21 1.04
## avg_Spring_22 1.02
## avg_Summer_21 0.70
## avg_Winter_22 0.21
## [1] "Replicates"
## .
## avg_GCAU21BB 1.21
## avg_GCAU21ML 0.24
## avg_GCAU21SS 1.45
## avg_GCAU21TW 1.72
## avg_GCSP22BB 2.56
## avg_GCSP22ML 0.12
## avg_GCSP22SS 0.82
## avg_GCSP22TW 0.11
## avg_GCSU21BB 2.79
## avg_GCSU21ML 0.11
## avg_GCSU21SS 0.52
## avg_GCSU21TW 0.10
## avg_GCWI22BB 0.20
## avg_GCWI22ML 0.19
## avg_GCWI22SS 0.32
## [1] "rel ASVmeans per Order in Core"
## Joining with `by = join_by(ASV)`
## ASV rel_ASVmeans Phylum Class
## 1 ASV238:Halioglobus 5.24 Proteobacteria Gammaproteobacteria
## 2 ASV289:Ilumatobacter 4.51 Actinobacteriota Acidimicrobiia
## 3 ASV56:Luteolibacter 4.04 Verrucomicrobiota Verrucomicrobiae
## 4 ASV425:TRA3-20 4.03 Proteobacteria Gammaproteobacteria
## 5 ASV321:Persicirhabdus 3.71 Verrucomicrobiota Verrucomicrobiae
## 6 ASV537:TRA3-20 2.78 Proteobacteria Gammaproteobacteria
## Order Family Genus Species
## 1 Pseudomonadales Halieaceae Halioglobus <NA>
## 2 Microtrichales Ilumatobacteraceae Ilumatobacter <NA>
## 3 Verrucomicrobiales Rubritaleaceae Luteolibacter <NA>
## 4 Burkholderiales TRA3-20 <NA> <NA>
## 5 Verrucomicrobiales Rubritaleaceae Persicirhabdus <NA>
## 6 Burkholderiales TRA3-20 <NA> <NA>
## [1] "rel ASVmeans per Genus in Core"
## Genus rel_ASVmeans
## 1 <NA> 23.17
## 2 Halioglobus 17.63
## 3 Persicirhabdus 8.01
## 4 Ilumatobacter 6.69
## 5 Candidatus Symbiobacter 4.79
## 6 Luteolibacter 4.66
## [1] "Amount of Taxa shared GC 7 and Bacterioplankton"
## [1] 98.4127
## [1] "Relative Abundance of GC 7 in Bacterioplankton"
## [1] 5.696795
sum(rownames(WGCNA_list[["GC"]]$SSUWGCNAlist[[paste("SSU", 8,
sep="")]]) %in%
rownames(WGCNA_list[["OE"]]$SSUWGCNAlist[[paste("SSU", 6,
sep="")]]))/length(rownames(WGCNA_list[["GC"]]$SSUWGCNAlist[[paste("SSU", 8,
sep="")]]) )*100
## [1] 100
colored_taxonomy_Fish <- pslist$Optics$colored_taxonomy_Fish
##############
#Network CORE#
##############
TaxLevel <- "ASV"
flipped <- T
for (Dataset in names(pslist)[grepl("ps_GCWF|ps_OEWF", names(pslist))]) {
require(plyr); require(dplyr); require(ggrepel); require(cowplot); require(phyloseq)
Datasetname <- sub("ps_", "", Dataset)
# GroupOfInterest_OE <- na.omit(unique(WGCNA_list[["OE"]]$SSUWGCNAlist[["SSU7"]]$ASV))
# GroupOfInterest_GC <- na.omit(unique(WGCNA_list[["GC"]]$SSUWGCNAlist[["SSU2"]]$ASV))
# print(paste(sum(GroupOfInterest_OE %in% GroupOfInterest_GC)/length(GroupOfInterest_OE)*100, "% of OE Module SSU7 are in GC Module SSU2"))
#
# GroupOfInterest <- c(GroupOfInterest_OE, GroupOfInterest_GC)
# #GroupOfInterest <- sub(".*:", "", GroupOfInterest)
# GroupOfInterest <-unique(GroupOfInterest)
if (grepl("GC", Datasetname)) {
MODULE <- "SSU8"
} else if (grepl("OE", Datasetname)) {
MODULE <- "SSU6"
}
ps <- pslist[[Dataset]]
GroupOfInterest <- na.omit(unique(WGCNA_list[[sub("WF", "",
Datasetname)]]$SSUWGCNAlist[[MODULE]]$ASV))
FILENAME <- paste(paste(save_name, "Alpha_BarPlot_WF", MODULE, Datasetname, sep="_"), ".png", sep="")
if (Datasetname %in% c("GC", "OE")) {
WIDTH <- 10 + length(sample_names(pslist[[Dataset]])) *0.12
HEIGHT <- 10 #length(sample_names(pslist[[Dataset]])) *0.08
} else if (Datasetname %in% c("WF")) {
WIDTH <- 5 + length(sample_names(pslist[[Dataset]])) *0.12
HEIGHT <- 10 #length(sample_names(pslist[[Dataset]])) *0.08
} else {
WIDTH <- 10 + length(sample_names(pslist[[Dataset]])) *0.12
HEIGHT <- 10 #length(sample_names(pslist[[Dataset]])) *0.08
}
################################
#Create Relative Abundance Data#
################################
ps_alpha_barplot <- ps %>%
#tax_glom(taxrank = TaxLevel) %>%
transform_sample_counts(function(x) {x/sum(x)*100}) %>% # Transform to rel. abundance
psmelt() %>% # Melt to long format
#filter(Abundance > 1) %>% # Filter out low abundance taxa
arrange(Genus) %>% # Sort data frame alphabetically by phylum
dplyr::arrange(desc(Abundance))
ps_alpha_barplot$ASV <- sub(".*:", "", ps_alpha_barplot$OTU)
#ps_alpha_barplot$ASV <- sub('\\.', ' ', ps_alpha_barplot$ASV)
############################
#Create TotalAbundance Data#
############################
# phylum_abundance <- ps_alpha_barplot %>%
# dplyr::group_by(.data[[TaxLevel]]) %>%
# dplyr::summarise(TotalAbundance = sum(Abundance))
# ordered_levels <- phylum_abundance %>%
# dplyr::arrange(desc(TotalAbundance)) %>%
# pull(.data[[TaxLevel]])
#
# ps_alpha_barplot$Taxa <- factor(ps_alpha_barplot[[TaxLevel]], levels = ordered_levels)
ps_alpha_barplot$SampleID <- factor(ps_alpha_barplot$SampleID, levels =
SSU_Samples[SSU_Samples %in% sample_names(pslist[[Dataset]])])
SSU_Samples[SSU_Samples %in% sample_names(pslist[[Dataset]])]
##########################
#Subset for Interest ASVs#
##########################
df <- ps_alpha_barplot #[ps_alpha_barplot$OTU %in% GroupOfInterest,]
#df <- df[df$OTU %in% GroupOfInterest,]
df$Abundance[!(df$OTU %in% GroupOfInterest)] <- 0
phylum_abundance <- df %>%
dplyr::group_by(.data[[TaxLevel]]) %>%
dplyr::summarise(TotalAbundance = sum(Abundance))
ordered_levels <- phylum_abundance %>%
dplyr::arrange(desc(TotalAbundance)) %>%
pull(.data[[TaxLevel]])
df$Taxa <- factor(df[[TaxLevel]], levels = ordered_levels)
Alpha_Diversity_df <- df
Alpha_Diversity_df$Colors <- NA
Alpha_Diversity_df$Reps <- factor(Alpha_Diversity_df$Reps, levels = RepOrder[RepOrder %in% Alpha_Diversity_df$Reps])
#################################
#Update to colored_taxonomy_Fish#
#################################
taxa_levels <- c("Genus", "Family", "Order", "Class", "Phylum")
taxa_level2 <- c("Taxa", "Phylum2")
Alpha_Diversity_df$Color <- "grey"
# Loop through each row of Alpha_Diversity_df
for (i in 1:nrow(Alpha_Diversity_df)) {
matching_color <- NA # Initialize matching color to NA
# Loop through each taxonomic level
for (level in taxa_levels) {
matching_row <- colored_taxonomy_Fish[colored_taxonomy_Fish$Taxa %in%
Alpha_Diversity_df[[level]][i], ]
# If there's a match, update matching_color and exit the loop
if (nrow(matching_row) > 0) {
matching_color <- matching_row$Color
break
}
}
# If there's no match at any level, try matching without numbers
if (is.na(matching_color)) {
matching_row <- colored_taxonomy_Fish[gsub("\\d", "", colored_taxonomy_Fish$Phylum2) %in%
Alpha_Diversity_df[["Phylum"]][i], ]
if (nrow(matching_row) > 0) {
matching_color <- matching_row$Color
}
}
# Assign the matching color to the corresponding row in Alpha_Diversity_df
Alpha_Diversity_df$Color[i] <- matching_color
}
color_mapping <- setNames(Alpha_Diversity_df$Color, Alpha_Diversity_df$Taxa)
color_mapping[["Candidatus Megaira"]] <- "purple"
color_mapping[["Acinetobacter.lwoffii"]] <- "#996600"
color_mapping[["Acinetobacter.johnsonii"]] <- "#663300"
color_mapping[["Shewanella"]] <- "#FF3366"
color_mapping[["Shewanella.baltica"]] <- "#FF0099"
color_mapping[["Shewanella.putrefaciens"]] <-"#FF0066"
color_mapping[["Photobacterium"]] <- "#FFF000"
color_mapping[["Aeromonas"]] <- "#FFCC00"
levels(Alpha_Diversity_df$Taxa)[!levels(Alpha_Diversity_df$Taxa) %in% head(unique(sub(".*:", "", GroupOfInterest)), 34) ] <-
"Other"
if (flipped == FALSE) {
if (Datasetname == "WF") {
p <- ggplot(Alpha_Diversity_df,
aes(x = SampleID, y = Abundance, fill = factor(Taxa))) +
geom_bar(stat = "identity") +
facet_grid(. ~ factor(Season, levels = SeasonOrder), drop = TRUE, scale = "free",
space = "free_x")
COL <- col.Palette$col.Palette.Season[names(col.Palette$col.Palette.Season) %in%
unique(sample_data(ps)$Season)]
} else {
p <- ggplot(Alpha_Diversity_df,
aes(x = SampleID, y = Abundance, fill = factor(Taxa))) +
geom_bar(stat = "identity") +
facet_grid(. ~ factor(Reps, levels = RepOrder), drop = TRUE, scale = "free", space = "free_y")
COL <- col.Palette$col.Palette.Reps[names(col.Palette$col.Palette.Reps) %in% unique(sample_data(ps)$Reps)]
}
p <- p +
atheme +
ylab("Relative Abundance [%] \n") + xlab("") +
labs(fill="") +
scale_fill_manual(values = color_mapping) +
ylim(0, 60) +
#ggrepel::geom_text_repel(x = df$SampleID, y = df$Abundance, aes(label = Labels),
#inherit.aes = FALSE, min.segment.length = 0,nudge_y = 20,size=3, max.overlaps = Inf) +
#awhite +
theme(strip.text.y = element_text(angle = 0)) +
#theme(axis.text.x=element_blank(),
#axis.text.y=element_blank(),
#axis.title.y.left = element_blank(),
#axis.ticks.y =element_blank()
#) +
theme(
panel.background = element_rect(fill='transparent'), #transparent panel bg
plot.background = element_rect(fill='transparent', color=NA), #transparent plot bg
#legend.background = element_rect(fill='transparent'), #transparent legend bg
#legend.box.background = element_rect(fill='transparent') #transparent legend panel
) +
theme(axis.title.x.bottom = element_text(color="grey13"),
strip.text = element_text(color = "black", face= "bold")) +
theme(
legend.title=element_text(size=9),
legend.text=element_text(size=9),
axis.text.x.bottom = element_text(size= 9, face = "bold", angle = 45, hjust = 1),
strip.text.y.left = element_text(size = 9,face = "bold"),
axis.title.y.left = element_text(size=12,face = "bold"))
g <- ggplot_gtable(ggplot_build(p))
stripr <- which(grepl('strip-t', g$layout$name))
fills <- alpha(COL, 0.8)
k <- 1
for (iii in stripr) {
j <- which(grepl('rect', g$grobs[[iii]]$grobs[[1]]$childrenOrder))
g$grobs[[iii]]$grobs[[1]]$children[[j]]$gp$fill <- fills[k]
k <- k+1}
Alpha_Diversity_BarPlot <- cowplot::plot_grid(g, labels = c(""), ncol = 1)
Alpha_Diversity_BarPlot <- plot_grid(Alpha_Diversity_BarPlot, ncol = 1, rel_heights = c(100))
ggsave(Alpha_Diversity_BarPlot, filename = FILENAME, path = pathPlots, device='png', dpi=300,
width = WIDTH, height = 6)
} else if (flipped == TRUE) {
# New facet label names for supp variable
COL <- col.Palette$col.Palette.Reps[names(col.Palette$col.Palette.Reps) %in% unique(sample_data(ps)$Reps)]
Short.labs <- gsub("[^0-9]", "", Alpha_Diversity_df$Reps)
names(Short.labs) <- Alpha_Diversity_df$Reps
p <- ggplot(Alpha_Diversity_df, aes(x = SampleID, y = Abundance, fill = factor(Taxa))) +
geom_bar(stat = "identity") +
coord_flip() +
#facet_grid(rows = vars(Reps), drop = TRUE, scale = "free", space = "free_x", switch = "y")
facet_grid(rows = vars(Reps), drop = TRUE, scale = "free", space = "free_x", switch = "y",
labeller = labeller(Reps = Short.labs))
p <- p +
ylab("Relative Abundance [%] \n") + xlab("") +
labs(fill="") + #atheme+
scale_fill_manual(values = color_mapping) +
ylim(0, 60) +
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank()) +
theme(
panel.background = element_rect(fill='transparent'), #transparent panel bg
plot.background = element_rect(fill='transparent', color=NA), #transparent plot bg
) +
theme(axis.title.x.bottom = element_text(color="grey13", size=15, face ="bold"),
strip.text = element_text(color = "black", face= "bold")) +
theme(
legend.title=element_text(size=9),
legend.text=element_text(size=9),
axis.text.x.bottom = element_text(size= 15, face = "bold", angle = 45, hjust = 1),
#strip.text.y.left = element_text(size = 9,face = "bold"),
#axis.title.y.left = element_text(size=12,face = "bold")
) +
theme(legend.position = "none")
g <- ggplot_gtable(ggplot_build(p))
stripr <- which(grepl('strip-l', g$layout$name))
fills <- alpha(COL, 0.8)
k <- 1
for (iii in stripr) {
j <- which(grepl('rect', g$grobs[[iii]]$grobs[[1]]$childrenOrder))
g$grobs[[iii]]$grobs[[1]]$children[[j]]$gp$fill <- fills[k]
k <- k+1}
Alpha_Diversity_BarPlot <- cowplot::plot_grid(g, labels = c(""), ncol = 1)
Alpha_Diversity_BarPlot <- plot_grid(Alpha_Diversity_BarPlot, ncol = 1, rel_heights = c(100))
ggsave(Alpha_Diversity_BarPlot, filename = FILENAME, path = pathPlots, device='png', dpi=300, width = 2.7, height = HEIGHT)
}
}
##########
#Salinity#
##########
TaxLevel <- "ASV"
flipped <- T
for (Dataset in names(pslist)[grepl("ps_GCWF|ps_OEWF", names(pslist))]) {
require(plyr); require(dplyr); require(ggrepel); require(cowplot); require(phyloseq)
Datasetname <- sub("ps_", "", Dataset)
# GroupOfInterest_OE <- na.omit(unique(WGCNA_list[["OE"]]$SSUWGCNAlist[["SSU7"]]$ASV))
# GroupOfInterest_GC <- na.omit(unique(WGCNA_list[["GC"]]$SSUWGCNAlist[["SSU2"]]$ASV))
# print(paste(sum(GroupOfInterest_OE %in% GroupOfInterest_GC)/length(GroupOfInterest_OE)*100, "% of OE Module SSU7 are in GC Module SSU2"))
#
# GroupOfInterest <- c(GroupOfInterest_OE, GroupOfInterest_GC)
# #GroupOfInterest <- sub(".*:", "", GroupOfInterest)
# GroupOfInterest <-unique(GroupOfInterest)
if (grepl("GC", Datasetname)) {
MODULE <- "SSU7"
} else if (grepl("OE", Datasetname)) {
MODULE <- "SSU3"
}
ps <- pslist[[Dataset]]
GroupOfInterest <- na.omit(unique(WGCNA_list[[sub("WF", "",
Datasetname)]]$SSUWGCNAlist[[MODULE]]$ASV))
FILENAME <- paste(paste(save_name, "Alpha_BarPlot_WF", MODULE, Datasetname, sep="_"), ".png", sep="")
if (Datasetname %in% c("GC", "OE")) {
WIDTH <- 10 + length(sample_names(pslist[[Dataset]])) *0.12
HEIGHT <- 10 #length(sample_names(pslist[[Dataset]])) *0.08
} else if (Datasetname %in% c("WF")) {
WIDTH <- 5 + length(sample_names(pslist[[Dataset]])) *0.12
HEIGHT <- 10 #length(sample_names(pslist[[Dataset]])) *0.08
} else {
WIDTH <- 10 + length(sample_names(pslist[[Dataset]])) *0.12
HEIGHT <- 10 #length(sample_names(pslist[[Dataset]])) *0.08
}
################################
#Create Relative Abundance Data#
################################
ps_alpha_barplot <- ps %>%
#tax_glom(taxrank = TaxLevel) %>%
transform_sample_counts(function(x) {x/sum(x)*100}) %>% # Transform to rel. abundance
psmelt() %>% # Melt to long format
#filter(Abundance > 1) %>% # Filter out low abundance taxa
arrange(Genus) %>% # Sort data frame alphabetically by phylum
dplyr::arrange(desc(Abundance))
ps_alpha_barplot$ASV <- sub(".*:", "", ps_alpha_barplot$OTU)
#ps_alpha_barplot$ASV <- sub('\\.', ' ', ps_alpha_barplot$ASV)
############################
#Create TotalAbundance Data#
############################
# phylum_abundance <- ps_alpha_barplot %>%
# dplyr::group_by(.data[[TaxLevel]]) %>%
# dplyr::summarise(TotalAbundance = sum(Abundance))
# ordered_levels <- phylum_abundance %>%
# dplyr::arrange(desc(TotalAbundance)) %>%
# pull(.data[[TaxLevel]])
#
# ps_alpha_barplot$Taxa <- factor(ps_alpha_barplot[[TaxLevel]], levels = ordered_levels)
ps_alpha_barplot$SampleID <- factor(ps_alpha_barplot$SampleID, levels =
SSU_Samples[SSU_Samples %in% sample_names(pslist[[Dataset]])])
SSU_Samples[SSU_Samples %in% sample_names(pslist[[Dataset]])]
##########################
#Subset for Interest ASVs#
##########################
df <- ps_alpha_barplot #[ps_alpha_barplot$OTU %in% GroupOfInterest,]
#df <- df[df$OTU %in% GroupOfInterest,]
df$Abundance[!(df$OTU %in% GroupOfInterest)] <- 0
phylum_abundance <- df %>%
dplyr::group_by(.data[[TaxLevel]]) %>%
dplyr::summarise(TotalAbundance = sum(Abundance))
ordered_levels <- phylum_abundance %>%
dplyr::arrange(desc(TotalAbundance)) %>%
pull(.data[[TaxLevel]])
df$Taxa <- factor(df[[TaxLevel]], levels = ordered_levels)
Alpha_Diversity_df <- df
Alpha_Diversity_df$Colors <- NA
Alpha_Diversity_df$Reps <- factor(Alpha_Diversity_df$Reps, levels = RepOrder[RepOrder %in% Alpha_Diversity_df$Reps])
#################################
#Update to colored_taxonomy_Fish#
#################################
taxa_levels <- c("Genus", "Family", "Order", "Class", "Phylum")
taxa_level2 <- c("Taxa", "Phylum2")
Alpha_Diversity_df$Color <- "grey"
# Loop through each row of Alpha_Diversity_df
for (i in 1:nrow(Alpha_Diversity_df)) {
matching_color <- NA # Initialize matching color to NA
# Loop through each taxonomic level
for (level in taxa_levels) {
matching_row <- colored_taxonomy_Fish[colored_taxonomy_Fish$Taxa %in%
Alpha_Diversity_df[[level]][i], ]
# If there's a match, update matching_color and exit the loop
if (nrow(matching_row) > 0) {
matching_color <- matching_row$Color
break
}
}
# If there's no match at any level, try matching without numbers
if (is.na(matching_color)) {
matching_row <- colored_taxonomy_Fish[gsub("\\d", "", colored_taxonomy_Fish$Phylum2) %in%
Alpha_Diversity_df[["Phylum"]][i], ]
if (nrow(matching_row) > 0) {
matching_color <- matching_row$Color
}
}
# Assign the matching color to the corresponding row in Alpha_Diversity_df
Alpha_Diversity_df$Color[i] <- matching_color
}
color_mapping <- setNames(Alpha_Diversity_df$Color, Alpha_Diversity_df$Taxa)
color_mapping[["Candidatus Megaira"]] <- "purple"
color_mapping[["Acinetobacter.lwoffii"]] <- "#996600"
color_mapping[["Acinetobacter.johnsonii"]] <- "#663300"
color_mapping[["Shewanella"]] <- "#FF3366"
color_mapping[["Shewanella.baltica"]] <- "#FF0099"
color_mapping[["Shewanella.putrefaciens"]] <-"#FF0066"
color_mapping[["Photobacterium"]] <- "#FFF000"
color_mapping[["Aeromonas"]] <- "#FFCC00"
levels(Alpha_Diversity_df$Taxa)[!levels(Alpha_Diversity_df$Taxa) %in% head(unique(sub(".*:", "", GroupOfInterest)), 34) ] <-
"Other"
if (flipped == FALSE) {
if (Datasetname == "WF") {
p <- ggplot(Alpha_Diversity_df,
aes(x = SampleID, y = Abundance, fill = factor(Taxa))) +
geom_bar(stat = "identity") +
facet_grid(. ~ factor(Season, levels = SeasonOrder), drop = TRUE, scale = "free",
space = "free_x")
COL <- col.Palette$col.Palette.Season[names(col.Palette$col.Palette.Season) %in%
unique(sample_data(ps)$Season)]
} else {
p <- ggplot(Alpha_Diversity_df,
aes(x = SampleID, y = Abundance, fill = factor(Taxa))) +
geom_bar(stat = "identity") +
facet_grid(. ~ factor(Reps, levels = RepOrder), drop = TRUE, scale = "free", space = "free_y")
COL <- col.Palette$col.Palette.Reps[names(col.Palette$col.Palette.Reps) %in% unique(sample_data(ps)$Reps)]
}
p <- p +
atheme +
ylab("Relative Abundance [%] \n") + xlab("") +
labs(fill="") +
scale_fill_manual(values = color_mapping) +
ylim(0, 60) +
#ggrepel::geom_text_repel(x = df$SampleID, y = df$Abundance, aes(label = Labels),
#inherit.aes = FALSE, min.segment.length = 0,nudge_y = 20,size=3, max.overlaps = Inf) +
#awhite +
theme(strip.text.y = element_text(angle = 0)) +
#theme(axis.text.x=element_blank(),
#axis.text.y=element_blank(),
#axis.title.y.left = element_blank(),
#axis.ticks.y =element_blank()
#) +
theme(
panel.background = element_rect(fill='transparent'), #transparent panel bg
plot.background = element_rect(fill='transparent', color=NA), #transparent plot bg
#legend.background = element_rect(fill='transparent'), #transparent legend bg
#legend.box.background = element_rect(fill='transparent') #transparent legend panel
) +
theme(axis.title.x.bottom = element_text(color="grey13"),
strip.text = element_text(color = "black", face= "bold")) +
theme(
legend.title=element_text(size=9),
legend.text=element_text(size=9),
axis.text.x.bottom = element_text(size= 9, face = "bold", angle = 45, hjust = 1),
strip.text.y.left = element_text(size = 9,face = "bold"),
axis.title.y.left = element_text(size=12,face = "bold"))
g <- ggplot_gtable(ggplot_build(p))
stripr <- which(grepl('strip-t', g$layout$name))
fills <- alpha(COL, 0.8)
k <- 1
for (iii in stripr) {
j <- which(grepl('rect', g$grobs[[iii]]$grobs[[1]]$childrenOrder))
g$grobs[[iii]]$grobs[[1]]$children[[j]]$gp$fill <- fills[k]
k <- k+1}
Alpha_Diversity_BarPlot <- cowplot::plot_grid(g, labels = c(""), ncol = 1)
Alpha_Diversity_BarPlot <- plot_grid(Alpha_Diversity_BarPlot, ncol = 1, rel_heights = c(100))
ggsave(Alpha_Diversity_BarPlot, filename = FILENAME, path = pathPlots, device='png', dpi=300,
width = WIDTH, height = 6)
} else if (flipped == TRUE) {
# New facet label names for supp variable
COL <- col.Palette$col.Palette.Reps[names(col.Palette$col.Palette.Reps) %in% unique(sample_data(ps)$Reps)]
Short.labs <- gsub("[^0-9]", "", Alpha_Diversity_df$Reps)
names(Short.labs) <- Alpha_Diversity_df$Reps
p <- ggplot(Alpha_Diversity_df, aes(x = SampleID, y = Abundance, fill = factor(Taxa))) +
geom_bar(stat = "identity") +
coord_flip() +
#facet_grid(rows = vars(Reps), drop = TRUE, scale = "free", space = "free_x", switch = "y")
facet_grid(rows = vars(Reps), drop = TRUE, scale = "free", space = "free_x", switch = "y",
labeller = labeller(Reps = Short.labs))
p <- p +
ylab("Relative Abundance [%] \n") + xlab("") +
labs(fill="") + #atheme+
scale_fill_manual(values = color_mapping) +
ylim(0, 60) +
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank()) +
theme(
panel.background = element_rect(fill='transparent'), #transparent panel bg
plot.background = element_rect(fill='transparent', color=NA), #transparent plot bg
) +
theme(axis.title.x.bottom = element_text(color="grey13", size=15, face ="bold"),
strip.text = element_text(color = "black", face= "bold")) +
theme(
legend.title=element_text(size=9),
legend.text=element_text(size=9),
axis.text.x.bottom = element_text(size= 15, face = "bold", angle = 45, hjust = 1),
#strip.text.y.left = element_text(size = 9,face = "bold"),
#axis.title.y.left = element_text(size=12,face = "bold")
) +
theme(legend.position = "none")
g <- ggplot_gtable(ggplot_build(p))
stripr <- which(grepl('strip-l', g$layout$name))
fills <- alpha(COL, 0.8)
k <- 1
for (iii in stripr) {
j <- which(grepl('rect', g$grobs[[iii]]$grobs[[1]]$childrenOrder))
g$grobs[[iii]]$grobs[[1]]$children[[j]]$gp$fill <- fills[k]
k <- k+1}
Alpha_Diversity_BarPlot <- cowplot::plot_grid(g, labels = c(""), ncol = 1)
Alpha_Diversity_BarPlot <- plot_grid(Alpha_Diversity_BarPlot, ncol = 1, rel_heights = c(100))
ggsave(Alpha_Diversity_BarPlot, filename = FILENAME, path = pathPlots, device='png', dpi=300, width = 2.7, height = HEIGHT)
}
}
#-
Excluded for knitting, works fine, just un-# the whole chunk
# #You need to download and Open Cytoscape on your computer for this to work
# #http://www.bioconductor.org/packages/devel/bioc/vignettes/rWikiPathways/inst/doc/Pathway-Analysis.html
# #https://yulab-smu.top/biomedical-knowledge-mining-book/clusterprofiler-go.html
# library(tidyverse)
# library(dplyr)
# library(plyr)
# library(clusterProfiler)
# library(enrichplot)
# library(ggplot2)
# library("org.Hs.eg.db")
# library("org.Dr.eg.db")
# library("org.Mm.eg.db")
# library(DOSE)
# library(rWikiPathways)
# library(jsonlite)
# library("DOSE")
# library("GO.db")
# library("GSEABase")
# library("dplyr")
# library("tidyr")
# library("stringr")
# library("RColorBrewer")
# library("rWikiPathways")
# library("RCy3")
#
# ###########
# #Cytoscape#
# ###########
# #list of cytoscape apps to install
# #Open Cytoscape
# cytoscapePing()
# # #list of app to install Do that once!
# # installApp('WikiPathways')
# # installApp('CyTargetLinker')
# # installApp('stringApp')
# # installApp('enrichmentMap')
# # installApp('autoannotate')
# # installApp('wordcloud')
# # installApp('stringapp')
# # installApp('aMatReader')
# # installApp('clustermaker2')
# # cytoscapeVersionInfo ()
# # installApp('STRINGapp')
# # installApp('aMatReader')
# # installApp('clusterMaker2')
# # The following setting is important, do not omit.
# library(WGCNA)
# options(stringsAsFactors = FALSE);
# for (Datasetname in names(WGCNA_list)) {
#
# omics_data <- WGCNA_list[[Datasetname]]$omics_data
# datTraits <- WGCNA_list[[Datasetname]]$datTraits
# moduleLabels <- WGCNA_list[[Datasetname]]$moduleLabels
# moduleColors <- WGCNA_list[[Datasetname]]$moduleColors
# MEs <- WGCNA_list[[Datasetname]]$MEs
# network <- WGCNA_list[[Datasetname]]$network
# ps <- WGCNA_list[[Datasetname]]$ps
# softPower <- WGCNA_list[[Datasetname]]$softPower
# GeneAnno <- WGCNA_list$WF$SSUWGCNAlist
# TaxAnno <- data.frame(tax_table(WGCNA_list$WF$ps)) %>% rownames_to_column(var="ASV")
#
# RepSums <- ps %>%
# transform_sample_counts(function(x) {x/sum(x)*100}) %>%
# phyloseq::otu_table() %>%
# as.data.frame() %>%
# rownames_to_column(var = "SampleID") %>%
# dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
# dplyr::group_by(Replicates) %>%
# dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>%
# t() %>%
# as.data.frame() %>%
# `colnames<-`(.[1, ]) %>%
# .[-1, ] %>%
# stats::setNames(paste0("avg_", colnames(.))) %>%
# mutate_all(as.numeric) %>%
# round(., digits = 2) %>%
# rownames_to_column(var="ASV")
#
# SeasonSums <- ps %>%
# transform_sample_counts(function(x) {x/sum(x)*100}) %>%
# phyloseq::otu_table() %>%
# as.data.frame() %>%
# rownames_to_column(var = "SampleID") %>%
# dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
# dplyr::group_by(Season) %>%
# dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>%
# t() %>%
# as.data.frame() %>%
# `colnames<-`(.[1, ]) %>%
# .[-1, ] %>%
# stats::setNames(paste0("avg_", colnames(.))) %>%
# mutate_all(as.numeric) %>%
# round(., digits = 2) %>%
# rownames_to_column(var="ASV")
#
# ############################################################################
# #Create WGCNA Dataframe: Species, ModuleMembership, Correlation to Abiotics#
# ############################################################################
# for (i in names(MEs)) {
# tryCatch({
#
# ModuleOfInterst <- paste(sub("ME", "", i))
#
# TOM = TOMsimilarityFromExpr(omics_data, power = softPower, nThreads =8);
# modules = as.numeric(sub("ME", "", i))
#
# colors <- labels2colors(modules)
# genes <- names(omics_data)
# inModule <- is.finite(match(moduleColors, colors));
# ASV <- genes[inModule];
# modColors <- moduleColors[inModule]
# Data <- as.data.frame(cbind(ASV, modColors))
#
# Data <- Data %>%
# left_join( #Add relative ASVmeans
# data.frame(t(phyloseq::otu_table(ps %>%
# transform_sample_counts(function(x) {x/sum(x)*100})))) %>%
# dplyr::mutate(ASVmeans = rowMeans(.)) %>%
# dplyr::mutate(ASV = rownames(.)) %>%
# dplyr::select(ASV, ASVmeans)) %>%
# left_join(as.data.frame(tax_table(ps)) %>% rownames_to_column(var="ASV"))
#
# geneModuleMembership <- cor(omics_data, MEs, use = "p") %>%
# as.data.frame() %>% #Create ModuleMembership
# dplyr::select(paste("ME", ModuleOfInterst, sep="")) %>%
# setNames(paste0("MM")) #No Individual naming here for Cytoscape columns beeing equal
#
# MMPvalue <- #Creating p.values for Module memberships
# as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), length(rownames(MEs)))) %>%
# dplyr::select(paste("MM", sep="")) %>%
# setNames(paste0("p.MM"))
#
# #setNames(paste0(sub("MM", "p.MM", names(.)), sep="")) %>%
# #dplyr::select(paste("p.MM", sep=""))
#
# Data <- Data %>%
# left_join(geneModuleMembership %>% rownames_to_column(var="ASV")) %>%
# left_join(MMPvalue %>% rownames_to_column(var="ASV")) %>%
# left_join(SeasonSums) %>%
# left_join(RepSums)
# #dplyr::arrange(desc(ASVmeans))
# rownames(Data) <- Data$ASV
#
# Data$ASVname <- gsub("^ASV\\d+:", "", Data$ASV)
#
#
# # Select the corresponding Topological Overlap
# TOM_mod = TOM[inModule, inModule];
# dimnames(TOM_mod) = list(Data$ASV, Data$ASV)
#
# #######################
# #Filter for Parameters#
# #######################
# Data_filt <- Data %>%
# filter(abs(MM) > 0.5 & p.MM < 0.05 & ASVmeans > 0.005)
# TOM_filt <- TOM_mod[Data_filt$ASV, Data_filt$ASV]
# # Export the network into edge and node list files Cytoscape can read
# #https://support.bioconductor.org/p/95965/
# # My operational answer is to select a threshold that keeps the file size manageable, then use filtering in Cytoscape to interactively choose a threshold that results in an informative plot. This assumes that Cytoscape is only used for visualization; if you're going to use Cytoscape for further analysis, you should probably set the threshold to zero.
# #https://www.biostars.org/p/9514423/
#
# cyt = exportNetworkToCytoscape(TOM_filt,
# edgeFile = paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_edges"), paste(modules, Datasetname, sep="_"), sep="_"), "txt", sep ="."),
# nodeFile = paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_nodes"), paste(modules, Datasetname, sep="_"), sep="_"), "txt", sep ="."),
# weighted = TRUE,
# threshold = 0.01, #Doing real Filtering in Cytoscape for Abundance
# nodeNames = Data_filt$ASV,
# altNodeNames = Data_filt$ASVname,
# nodeAttr = Data_filt)
# }, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})}
# }
#
# ###########
# #Cytoscape#
# ###########
# #list of cytoscape apps to install
# #Open Cytoscape
# cytoscapePing()
# ##############################
# #Import Networks to Cytoscape#
# ##############################
# library("RCy3")
# cytoscapePing () # make sure cytoscape is open
# cytoscapeVersionInfo ()
# for (Datasetname in names(WGCNA_list)) {
#
# omics_data <- WGCNA_list[[Datasetname]]$omics_data
# datTraits <- WGCNA_list[[Datasetname]]$datTraits
# moduleLabels <- WGCNA_list[[Datasetname]]$moduleLabels
# moduleColors <- WGCNA_list[[Datasetname]]$moduleColors
# MEs <- WGCNA_list[[Datasetname]]$MEs
#
#
# for (i in names(MEs)) {
# tryCatch({
#
# modules = as.numeric(sub("ME", "", i))
# colors = labels2colors(modules)
#
# node <- read.delim(paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_nodes"),
# paste(modules, Datasetname, sep="_"), sep="_"), "txt", sep ="."))
#
# edge <- read.delim(paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_edges"),
# paste(modules, Datasetname, sep="_"), sep="_"), "txt", sep ="."))
#
#
# #colnames(node)
# #colnames(node) <- c("id","altName","node_attributes")
#
# node <- node %>%
# dplyr::mutate(id = nodeName) %>%
# dplyr::select(id, everything())
#
# edge <- edge %>%
# rename("fromNode" = "source") %>%
# rename("toNode" = "target") %>%
# rename("direction" = "interaction")
#
# createNetworkFromDataFrames(node, edge, title= paste(modules, Datasetname, sep="_"),
# collection=paste("DataFrame", Datasetname, sep="_"))
#
#
# ###########################
# #Set some style parameters#
# ###########################
# #getVisualPropertyNames()
# setVisualStyle('Sample1')
# # set up my own style
# style.name = paste("Style", Datasetname, sep="_")
# defaults <- list( #NODE_SHAPE="diamond",
# NODE_SIZE=10,
# NODE_FILL_COLOR ="black",
# EDGE_TRANSPARENCY=120
# #NODE_LABEL_POSITION="W,E,c,0.00,0.00"
# )
# nodeLabels <- mapVisualProperty('node label','id','p')
# arrowShapes <- mapVisualProperty('Edge Target Arrow Shape','interaction','d',c("activates","inhibits","interacts"),c("Arrow","T","None"))
#
# edgeWidth <- mapVisualProperty('edge width','weight','p')
#
# nodeFills <- mapVisualProperty('node fill color','Phylum','d', names(phylum_colors_Cytoscape), as.character(phylum_colors_Cytoscape))
#
# #nodeSize<- mapVisualProperty('NODE_SIZE','avg_Spring_22','c',c(0, 100),c(10, 1000))
#
# nodeSize <- mapVisualProperty('NODE_SIZE', paste0("ASVmeans"), 'c', c(0, 100), c(15, 1000))
#
# nodeLabelSize <- mapVisualProperty('NODE_LABEL_FONT_SIZE', paste0("ASVmeans"), 'c', c(0, 100), c(15, 100))
#
# nodeLabel<-mapVisualProperty('node label','ASVname','p')
#
# createVisualStyle(style.name, defaults, list(nodeLabels,arrowShapes, nodeFills, nodeLabel, nodeSize, edgeWidth, nodeLabelSize))
#
# setVisualStyle(style.name)
#
# }, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
# }
# }
#
#
# # For Visualization in Figure 3B change to cytoscape and do:
# # Cytoscape -> Import -> Table -> SSU_Data with Column relMeanASV -> To selected network only
# #Styles -> LabelSize and LableText continous
# #https://www.biostars.org/p/192885/#192910
# #https://www.biostars.org/p/454313/
#
# #https://manual.cytoscape.org/en/stable/Styles.html#tutorial-3-creating-a-new-style-with-a-continuous-mapping
# #https://manual.cytoscape.org/en/stable/Styles.html
# #https://www.biostars.org/p/454866/
#
#
# #You need to download and Open Cytoscape on your computer for this to work
# #http://www.bioconductor.org/packages/devel/bioc/vignettes/rWikiPathways/inst/doc/Pathway-Analysis.html
# #https://yulab-smu.top/biomedical-knowledge-mining-book/clusterprofiler-go.html
#
# ###########
# #Cytoscape#
# ###########
# #list of cytoscape apps to install
# #Open Cytoscape
# cytoscapePing()
# library(WGCNA)
# options(stringsAsFactors = FALSE);
# for (Datasetname in names(WGCNA_list)) {
#
# omics_data <- WGCNA_list[[Datasetname]]$omics_data
# datTraits <- WGCNA_list[[Datasetname]]$datTraits
# moduleLabels <- WGCNA_list[[Datasetname]]$moduleLabels
# moduleColors <- WGCNA_list[[Datasetname]]$moduleColors
# MEs <- WGCNA_list[[Datasetname]]$MEs
# network <- WGCNA_list[[Datasetname]]$network
# ps <- WGCNA_list[[Datasetname]]$ps
# softPower <- WGCNA_list[[Datasetname]]$softPower
# GeneAnno <- WGCNA_list$WF$SSUWGCNAlist
# TaxAnno <- data.frame(tax_table(WGCNA_list$WF$ps)) %>% rownames_to_column(var="ASV")
#
# RepSums <- ps %>%
# transform_sample_counts(function(x) {x/sum(x)*100}) %>%
# phyloseq::otu_table() %>%
# as.data.frame() %>%
# rownames_to_column(var = "SampleID") %>%
# dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
# dplyr::group_by(Replicates) %>%
# dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>%
# t() %>%
# as.data.frame() %>%
# `colnames<-`(.[1, ]) %>%
# .[-1, ] %>%
# stats::setNames(paste0("avg_", colnames(.))) %>%
# mutate_all(as.numeric) %>%
# round(., digits = 2) %>%
# rownames_to_column(var="ASV")
#
# SeasonSums <- ps %>%
# transform_sample_counts(function(x) {x/sum(x)*100}) %>%
# phyloseq::otu_table() %>%
# as.data.frame() %>%
# rownames_to_column(var = "SampleID") %>%
# dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
# dplyr::group_by(Season) %>%
# dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>%
# t() %>%
# as.data.frame() %>%
# `colnames<-`(.[1, ]) %>%
# .[-1, ] %>%
# stats::setNames(paste0("avg_", colnames(.))) %>%
# mutate_all(as.numeric) %>%
# round(., digits = 2) %>%
# rownames_to_column(var="ASV")
#
# ############################################################################
# #Create WGCNA Dataframe: Species, ModuleMembership, Correlation to Abiotics#
# ############################################################################
#
# TOM = TOMsimilarityFromExpr(omics_data, power = softPower, nThreads =8);
# modules = as.numeric(sub("ME", "", i))
#
#
# genes <- names(omics_data)
# inModule <- is.finite(match(moduleColors, moduleColors));
# ASV <- genes[inModule];
# modColors <- moduleColors[inModule]
# Data <- as.data.frame(cbind(ASV, modColors))
#
# Data <- Data %>%
# left_join( #Add relative ASVmeans
# data.frame(t(phyloseq::otu_table(ps %>%
# transform_sample_counts(function(x) {x/sum(x)*100})))) %>%
# dplyr::mutate(ASVmeans = rowMeans(.)) %>%
# dplyr::mutate(ASV = rownames(.)) %>%
# dplyr::select(ASV, ASVmeans)) %>%
# left_join(as.data.frame(tax_table(ps)) %>% rownames_to_column(var="ASV"))
#
# geneModuleMembership <- cor(omics_data, MEs, use = "p") %>%
# as.data.frame() #%>% #Create ModuleMembership
# #dplyr::select(paste("ME", ModuleOfInterst, sep="")) %>%
# #setNames(paste0("MM")) #No Individual naming here for Cytoscape columns beeing equal
#
# MMPvalue <- #Creating p.values for Module memberships
# as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), length(rownames(MEs)))) #%>%
# #dplyr::select(paste("MM", sep="")) %>%
# names(MMPvalue) <- paste(sub("ME", "p.ME", names(MMPvalue)))
#
#
# #setNames(paste0(sub("MM", "p.MM", names(.)), sep="")) %>%
# #dplyr::select(paste("p.MM", sep=""))
#
# Data <- Data %>%
# dplyr::left_join(geneModuleMembership %>% tibble::rownames_to_column(var="ASV")) %>%
# left_join(MMPvalue %>% rownames_to_column(var="ASV")) %>%
# left_join(SeasonSums) %>%
# left_join(RepSums)
# #dplyr::arrange(desc(ASVmeans))
# rownames(Data) <- Data$ASV
#
# Data$ASVname <- gsub("^ASV\\d+:", "", Data$ASV)
#
#
# # Select the corresponding Topological Overlap
# TOM_mod = TOM[inModule, inModule];
# dimnames(TOM_mod) = list(Data$ASV, Data$ASV)
#
# #######################
# #Filter for Parameters#
# #######################
# #Data_filt <- Data %>%
# # filter(abs(MM) > 0.5 & p.MM < 0.05 & ASVmeans > 0.005)
#
# Data_filt <- Data %>%
# group_by(modColors) %>%
# top_n(50, ASVmeans)
#
# TOM_filt <- TOM_mod[Data_filt$ASV, Data_filt$ASV]
#
#
#
#
# # Export the network into edge and node list files Cytoscape can read
# #https://support.bioconductor.org/p/95965/
# # My operational answer is to select a threshold that keeps the file size manageable, then use filtering in Cytoscape to interactively choose a threshold that results in an informative plot. This assumes that Cytoscape is only used for visualization; if you're going to use Cytoscape for further analysis, you should probably set the threshold to zero.
# #https://www.biostars.org/p/9514423/
#
# cyt = exportNetworkToCytoscape(TOM_filt,
# edgeFile = paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_edges"), paste(Datasetname, sep="_"), sep="_"), "txt", sep ="."),
# nodeFile = paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_nodes"), paste(Datasetname, sep="_"), sep="_"), "txt", sep ="."),
# weighted = TRUE,
# threshold = 0.1, #Doing real Filtering in Cytoscape for Abundance
# nodeNames = Data_filt$ASV,
# altNodeNames = Data_filt$ASVname,
# nodeAttr = Data_filt)
# }
#
# ###########
# #Cytoscape#
# ###########
# ##############################
# #Import Networks to Cytoscape#
# ##############################
# library("RCy3")
# cytoscapePing () # make sure cytoscape is open
# cytoscapeVersionInfo ()
# for (Datasetname in names(WGCNA_list)) {
#
# omics_data <- WGCNA_list[[Datasetname]]$omics_data
# datTraits <- WGCNA_list[[Datasetname]]$datTraits
# moduleLabels <- WGCNA_list[[Datasetname]]$moduleLabels
# moduleColors <- WGCNA_list[[Datasetname]]$moduleColors
# MEs <- WGCNA_list[[Datasetname]]$MEs
#
# modules = as.numeric(sub("ME", "", i))
# colors = labels2colors(modules)
#
# node <- read.delim(paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_nodes"),
# paste(Datasetname, sep="_"), sep="_"), "txt", sep ="."))
#
# edge <- read.delim(paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_edges"),
# paste(Datasetname, sep="_"), sep="_"), "txt", sep ="."))
#
#
# #colnames(node)
# #colnames(node) <- c("id","altName","node_attributes")
#
# node <- node %>%
# dplyr::mutate(id = nodeName) %>%
# dplyr::select(id, everything())
#
#
# edge <- edge %>% plyr::rename(c("fromNode" = "source",
# "toNode" = "target",
# "direction" = "interaction"))
#
#
# createNetworkFromDataFrames(node, edge, title= paste(Datasetname, sep="_"),
# collection=paste("DataFrame", Datasetname, sep="_"))
#
#
# ###########################
# #Set some style parameters#
# ###########################
# #getVisualPropertyNames()
# setVisualStyle('Sample1')
# # set up my own style
# style.name = paste("Style_overall", Datasetname, sep="_")
# defaults <- list( #NODE_SHAPE="diamond",
# NODE_SIZE=10,
# NODE_FILL_COLOR ="black",
# EDGE_TRANSPARENCY=120
# #NODE_LABEL_POSITION="W,E,c,0.00,0.00"
# )
# nodeLabels <- mapVisualProperty('node label','id','p')
# arrowShapes <- mapVisualProperty('Edge Target Arrow Shape','interaction','d',c("activates","inhibits","interacts"),c("Arrow","T","None"))
#
# edgeWidth <- mapVisualProperty('edge width','weight','p')
#
# nodeFills <- mapVisualProperty('node fill color','Phylum','d', names(phylum_colors_Cytoscape), as.character(phylum_colors_Cytoscape))
#
# #nodeSize<- mapVisualProperty('NODE_SIZE','avg_Spring_22','c',c(0, 100),c(10, 1000))
#
# nodeSize <- mapVisualProperty('NODE_SIZE', paste0("ASVmeans"), 'c', c(0, 100), c(15, 1000))
#
# nodeLabelSize <- mapVisualProperty('NODE_LABEL_FONT_SIZE', paste0("ASVmeans"), 'c', c(0, 100), c(15, 100))
#
# nodeLabel<-mapVisualProperty('node label','ASVname','p')
#
# createVisualStyle(style.name, defaults, list(nodeLabels,arrowShapes, nodeFills, nodeLabel, nodeSize, edgeWidth, nodeLabelSize))
#
# setVisualStyle(style.name)
#
# }
#
#
#
# # For Visualization in Figure 3B change to cytoscape and do:
# # Cytoscape -> Import -> Table -> SSU_Data with Column relMeanASV -> To selected network only
# #Styles -> LabelSize and LableText continous
# #https://www.biostars.org/p/192885/#192910
# #https://www.biostars.org/p/454313/
#
# #https://manual.cytoscape.org/en/stable/Styles.html#tutorial-3-creating-a-new-style-with-a-continuous-mapping
# #https://manual.cytoscape.org/en/stable/Styles.html
# #https://www.biostars.org/p/454866/
#-
PICRUST2 https://github.com/picrust/picrust2/wiki/Full-pipeline-script
How to interpret PICRUSt2 output The most important thing to keep in mind when running PICRUSt2 is that predictions of functional potential are output. This data can be useful for generating hypotheses, but should always be interpreted cautiously especially when focused on a single function or predictions for a single ASV (or OTU). You should read through the key limitations of metagenome inference before proceeding. The final output tables produced by PICRUSt2 are essentially the read depth per ASV multiplied by the predicted function abundances per ASV. The output is calculated slightly differently depending on the file as described below, but that is the basic idea. Accordingly, the final tables are not output in terms of relative abundance and the abundances per sample will sum to different numbers.
There are many possible ways to analyze the output of PICRUSt2, such as rounding the data and using the ALDEx2 R package. However, no matter what approach you use it is important to keep in mind that the data is compositional. This means that it would be inappropriate to run a Wilcoxon test (for example) on the output table without first transforming the data somehow (e.g. to relative abundance, centred-log ratio transformation, etc.).
The Enzyme Commission (EC) [4] has classified all enzymes based on the enzymatic reactions they catalyze. Each enzyme has an EC number, which is a hierarchical number that distinguishes enzymes by the type of reactions they catalyze.
awk ‘{print $1 }’ ps_asv.fna > seqs1.fna sed ‘s/://g’ < seqs1.fna > seqs.fna
The input files should be a FASTA of amplicon sequences variants (ASVs; i.e. your representative sequences, not your raw reads, which is study_seqs.fna below) and a BIOM table of the abundance of each ASV across each sample (study_seqs.biom below). Note that a tab-delimited table with ASV ids as the first column and sample abundances as all subsequent columns will also work.
Version: 2.3.0_b mamba create -n picrust2 -c bioconda picrust2=2.3.0_b
Not installable on Apple M1 atm also not downgrading to 2.3.0_b because it requires pyhton <2.7 which also does not exist for M1 mamba create -n picrust2 -c bioconda -c conda-forge picrust2=2.5.2
Also on Cluster Biom-table seems to have a problem
Does not work atm
https://github.com/cozygene/FEAST https://www.nature.com/articles/s41592-019-0431-x
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10125963/
# library(FEAST)
# metadata <- Load_metadata(metadata_path = "/Users/admin/Downloads/metadata_example_multi.txt")
# otus <- Load_CountMatrix(CountMatrix_path = "/Users/admin/Downloads/otu_example_multi.txt")
#
# FEAST_output <- FEAST(C = otus, metadata = metadata, different_sources_flag = 1, dir_path = "~/FEAST/Data_files/", outfile="demo")
#
# FEAST_output <- FEAST(C = otus, metadata = metadata, different_sources_flag = 1)
#
# devtools::install_github("cozygene/FEAST", ref = "FEAST")
#saveRDS(pslist, file.path(path_Output_16S,paste(paste(save_name, "16S_merge_Filter_ASV_besttax", Date, sep="_"), ".rds", sep="")))
# pslistraw <- readRDS(file.path(path_Output_16S, paste(paste(save_name,"ps_16S_merge_pslistraw", Date, sep="_"), ".rds", sep="")))
#
# pslist <- readRDS(file.path(path_Output_16S,paste(paste(save_name, "16S_merge_Filter_ASV_besttax", Date, sep="_"), ".rds", sep="")))
#
# WGCNA_list <- readRDS(file = paste0(file.path(path_Output_16S, paste(paste(save_name, "WGCNA_list", Date, sep="_"), ".rds", sep=""))))
when necessary
# require(tidyverse)
# pslist <- pslist[!grepl("TSE|clr", names(pslist))]
# names(pslist)
#
# for (i in names(pslist)[grepl("ps", names(pslist))]){
# ps <- pslist[[i]]
# g<- names(pslist[i])
#
# A<-SAMDF16S[SAMDF16S$SampleID %in% rownames(sample_data(ps)),]
# Sample_Data <- as.data.frame(sample_data(ps))
# A<-left_join(A, Sample_Data[,names(Sample_Data) %in% c("SampleID", "nonchim", "Run", "Input")])
# rownames(A) <- A$SampleID
#
# #PHY <-phy_tree(ps)
# OTU <- otu_table(ps, taxa_are_rows=FALSE)
# TAX <- tax_table(ps)
# pslist[[i]] <- phyloseq(otu_table(OTU, taxa_are_rows=FALSE),
# sample_data(A), tax_table(TAX)) #, phy_tree(PHY)
# }
#
# for (i in names(pslist)[grepl("ps", names(pslist))]){
# g <- paste(i)
# a <- length(pslist)
# ps <- pslist[[i]]
# ps_clr <- microbiome::transform(ps, "clr")
# pslist[[a+1]] <- ps_clr
# names(pslist)[[a+1]] <- paste("clr", sub("ps", "\\1", g), sep="")}
#
# for (i in names(pslist)[grepl("clr", names(pslist))]){
# g <- paste(i)
# a <- length(pslist)
# clr <- pslist[[i]]
# TSE<-mia::makeTreeSummarizedExperimentFromPhyloseq(clr)
# pslist[[a+1]] <- TSE
# names(pslist)[[a+1]] <- paste("TSEc.l.r", sub("clr", "\\1", g), sep="")}
#-